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We present a statistical mechanics approach to the protein folding problem. We first review some 
of the basic properties of proteins, and introduce some physical models to describe their ther- 
modynamics. These models rely on a random heteropolymeric description of these non random 
biomolecules. Various kinds of randomness are investigated, and the connection with disordered 
systems is discussed. We conclude by a brief study of the dynamics of proteins. 



Natural proteins have the property of folding into an (almost) unique compact native 
structure, which is of biological interest The compactness of this unique native state 
is largely due to the existence of an optimal amount of hydrophobic amino-acid residues S, 
since these biological objects are usually designed to work in water. The task of predicting 
the conformation of the three-dimensional structure from the linear primary sequence is 
often referred to as the protein folding problem. Already at this stage, a rigorous analytical 
theory appears difficult, since it amounts to study a mesoscopic system (most proteins 
have between 100 and 500 residues), notwithstanding the solvent's properties. 

This mesoscopic system is of a classical nature; the quantum mechanical valence 
electrons of the atoms induce interactions between the heavy "nuclei" which can then 
be treated as classical objects interacting through classical many-body interactions. This 
observation forms the basis of Molecular Dynamics, Monte Carlo, or Statistical Mechanical 
models. 

To further complicate the matter, both the compactness and the chemical heterogene- 
ity of a given protein tend to slow down dynamical processes: the question of the kinetic 
control of the folding (as opposed to a thermodynamic control) is therefore periodically 
asked. This remark suggests that the folding process may have something in common with 
the physics of glassy systems, where competing interactions (frustration) and/or disorder 
lead to a very rugged phase space, resulting in slow dynamical processes. In this review, 
we will assume that the folded state of proteins is thermodynamically stable (the folded 
state is the state of minimal free energy). 

There has been a considerable amount of numerical simulations of proteins, using 
molecular dynamics or Monte Carlo calculations (for a review see .This promising 
approach will not be discussed in this review. We will use only analytical approaches 
throughout. Similarly, models emphasizing the micro-crystalline character of the folded 
proteins will not be addressed hereB. 

The outline of this review is the following. In the first section, we introduce at an 
elementary level, some notions of the physics, chemistry and biology of proteins. In the 
second section, we study (using statistical physics methods) some heteropolymer models 
which are possibly relevant to the protein folding problem. The phase diagrams of these 
models bear some qualitative resemblance to the real systems. 

In the last section, we tackle the issues of dynamics. In view of the complexity of 
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the problem, we first study the homopolymer case. The heteropolymeric aspect is then 
studied at a more phenomenological level. 

1 Biophysical background 



For a review on the biological aspects of protein, the reader is referred to the books 




1.1 Elements of Chemistry 

Proteins are biological molecules, present in any living organism. Their biological function 
include catalysis (enzymes), transport of ions (hemoglobin, chlorophyll, etc.), muscle 
contraction, ... They also are present in virus shells, prions, etc. The biological activity 
involves a small set of atoms, called the "active site" of the protein, where chemical 
reactions take place. 

Proteins belong to the group of biopolymers, which also comprise nucleic acids (DNA, 
RNA) and polysaccharides. From a physico-chemical point of view, biopolymers are 
heteropolymers, made out of different species of monomers. For proteins, the monomers 
are amino-acids, chosen from twenty different species. 

The chemical formula of an amino-acid can be written as: 

NH2 - CaHR - COOH (1) 

where NH2 is the amine group and COOH is the acidic group (except for proline, which 
has an imine group). Each amino acid is characterized by its residue R . If the residue 
is not reduced to a hydrogen atom (such as glycine), the alpha-carbon atom Ca is asym- 
metric. In all known natural proteins, the alpha-carbons have the same chirality (they 
are all left-handed) and the origin of this asymmetry is not known. 
The list of amino acids is: 

1. Alanine, Isoleucine, Leucine, Methionine, Phenylalanine, Proline, Tryptophan, Va- 
line. 

2. Asparagine, Cysteine, Glutamine, Glycine, Serine, Threonine, Tyrosine. 

3. Arginine, Histidine, Lysine. 

4. Aspartic acid. Glutamic acid. 

For instance, the chemical formula of alanine is: 

NH2 - CaH - CH3 - COOH (2) 

and that of tryptophan is: 

NH2 - CaH - CH2 -C-CH-NH- CqH^ - COOH (3) 

The smallest residue is glycine, which is just a single H atom, and thus non chiral, 
and the largest is tryptophan, which contains ten heavy (non hydrogen) atoms. 
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Figure 1: Peptide bond. The thick hne denotes the backbone. 

The above gross classification of amino acids refers to their interactions with water, 
their natural solvent. Group |^ is made of non polar hydrophobic residues. The three 
other groups are made of hydrophilic residues. From an electrostatic point of view, group 
[^corresponds to polar neutral residues, group [^corresponds to positively charged residues, 
and group § to negatively charged ones. 

The typical size of a protein ranges from approximately 100 amino acids for small 
proteins to 500 for long immuno-globulins. Due to this rather small size, knots are not 
present in proteins. 

A protein is made by polycondensation of amino acids, which can be schematically 
written as: 

NH2 - CaHRi - COOH + NH2 - CaHR2 - COOH 

NH2 - CaHRi - CONH - CaHR2 - COOH + H2O (4) 

The repetition of this process produces the protein, a weakly branched polymer, charac- 
terized by its chemical sequence Ri,R2-, - ■ ■ -.Rn- 

The polycondensation produces a "peptide bond" CONH, represented in Fig. [l[. 

Due to electronic hybridization, this bond is strongly planar. 

One can distinguish two types of degrees of freedom in proteins: 

1 . Hard degrees of freedom: these are the covalent bonds (linking covalently two atoms 
along the chain), the valence angles (angle between two covalent bonds) and the 
peptide bond. These degrees of freedom are very rigid at room temperature, since, 
as we shall see later, their deformation requires energies much higher than kT. 

2. Soft degrees of freedom: they are essentially the torsion angles along the backbone 
chain, and of the side chains. Their energy scale is such that they can easily fluctuate 
at room temperature. 

1.2 The possible states of proteins 
Qualitative description of the phases 

Although it was long believed that proteins are either denatured or native, it seems now 
well established that they may in fact exist in at least three different phases. Originally, 
the phases referred to the biological activity of the protein. In the "native phase" , the 
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Figure 2: Computer generated "ball and stick" representation of crambin. 

protein has its full biological activity, whereas in the "denatured phase" , it does not have 
any biological activity. It was soon recognized that this change in activity was related to 
some important structural changes in the protein. 
The following classification is widely accepted: 

1. Native state 

In this phase, the protein is said to be folded and has its full biological activity; it is 
compact, globular, and has a unique and well defined three dimensional structure. 
This implies that in this state, the active site is well defined. 

As we shall discuss later, the uniqueness of the folded state is quite puzzling, given 
that the number of compact states of a homopolymer of size is known to behave 
like fi^ i. Even with an extremely conservative value fi = 2 and = 100, this 
represents an astronomically large number of compact configurations, and it is quite 
amazing that proteins always find their ways to the correct folded state. Basically, 
the conformational entropy of the native state is zero. 

In Fig. |2|, we show a graphical representation of the protein crambin, generated from 
its X-ray data. 

2. Denatured states 

The denatured states are characterized by a lack of biological activity of the protein. 
Depending on chemical conditions, it seems that there exists at least two denatured 
phases: 
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(a) Coil state 

In this state, the denatured protein is in a coil state: it has no definite shape, 
hke a homopolymer in a good solvent. Although in this phase there might be 
local aggregation phenomena, it is fairly well described as the swollen phase of 
a homopolymer in a good solvent. It is a state of large conformational entropy. 

(b) Molten globule 

At low pH (acidic conditions), some proteins may exist in a compact state, 
named "molten globule" This state is compact (the chain is globular); it 
does not however have a well defined structure and bears strong resemblance 
to the collapsed phase of a homopolymer in a bad solvent. It is believed 
that this state is slightly less compact than the native state, and has finite 
conformational entropy. Anticipating on the next sections, the molten globule 
seems to have a large content of secondary structure, but not necessarily at the 
right place (with respect to the native state). 

In vitro, the transition between the various phases is controlled by temperature, pH, 
denaturant agent (such as urea or guanidine). 

Time scales 

There are two basic time scales in the problem: 

1. Microscopic 

The shortest time involved in protein dynamics is related to the vibrational modes 
of the covalent bonds. The associated time scale is 10~^^s. 

2. Macroscopic 

Typical times for the folding of a protein ranges from 10~^s to Is. This is many 
order of magnitudes larger than the microscopic time, and it is quite puzzling to 
understand why such a long time is necessary for such a small system to relax to 
equilibrium. As we shall see, the main reason is that the energy landscape of a 
compact chain is very rugged, with an exponentially large number of metastable 
states, separated by high barriers. This situation is familiar in spin-glasses or other 
disordered systems. We already mentioned that even in a homopolymer chain, 
there is of the order of /i^ compact quasi-degenerate ground states, separated by 
high barriers. 

Experimental tecliniques 

There are several techniques which allow one to study the folded structure of proteins, 
and each one gives access to a different aspect of the problem. 

1. Biological techniques: they mainly use the recovery of the biological activity of the 
proteins. These techniques allow measurement of rate constants, reaction yields, 
etc... 

2. Measurements of the radius of gyration: one can measure the radius of gyration, as 
well as the structure factor of proteins, by use of X-ray or neutron scattering. 



5 



3. NMR: this technique ahows to detect neighboring pairs of resonating protons which 
in turn give strong constraints for the spatial resolution of structures. 

4. Circular dichroi'sm: CD allows to look for secondary structures (which will be defined 
slightly later). It is sensitive to optical activity (due to the presence of a-helices). 

5. X-ray crystallography: this is the most precise method to solve the three dimensional 
structure of proteins. In order to get any structural information about proteins, it is 
necessary to crystallize the proteins so as to freeze the positions of the atoms. This 
is a very difficult task, since one must first make a crystal from the protein, and 
then resolve its structure (in particular, one has to resolve the phase ambiguity). 

1.3 The different structural levels 

Since the discovery and resolution of many protein structures, it has become customary 
to distinguish several levels of organization in the structure. 

Primary structure 

The primary structure is just the chemical sequence of amino acids along the main back- 
bone chain. This chemical structure is routinely determined experimentally, by using 
techniques such as electrophoresis, etc... 

Secondary structures 

Pauling and CorejEl first predicted theoretically that proteins should exhibit some local 
ordering, now known as secondary structures. Their prediction was based on energy 
considerations: they showed that there are certain regular structures which maximize 
the number of Hydrogen bonds (H-bonds) between the C-0 and the H-N groups of the 
backbone. There are basically two such types of structures: 

1. a-helices 

These are one-dimensional structures. The H-bonds are aligned with the axis of the 
helix (see Fig. ^). There are 3.6 amino acids per helix turn, and the typical size of 
a helix is 5 turns. 

2. /3-sheets 

These are quasi two-dimensional structures. The H-bonds are perpendicular to the 
strands. A typical /3-sheet has a length of 8 amino acids, and consists of approxi- 
mately 3 strands (see Fig. |^. 

Tertiary structure 

It is the compact packing of the secondary structures which make up the tertiary structure: 
it is essentially the full three dimensional structure of the protein. 
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Figure 3: A a-helix. The dashed lines represent the H-bonds. 



Figure 4: A /3-sheet. The dashed hnes represent the H-bonds. 
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Quaternary structure 

Large proteins are made up of entities called domains, which are compact globular regions, 
separated by a few amino acids. These domains are mobile one relative to the others and 
their arrangement is called the quaternary structure. 

1.4 Interactions 

It is important to analyze the interactions between the various atoms present in the 
system. At the microscopic level there is only Coulombic interactions; such a microscopic 
approach is at present out of reach. 

Chemists and physicists have instead analyzed interactions at a more macroscopic 
level: they introduce semi-empirical interactions which can then be included in energy 
minimizations, molecular dynamics, Monte Carlo calculations, etc... 

Along these lines, one may distinguish two main types of interactions: 

1. bonded 

These are the covalent bonds between the atoms of the protein. One may further 
define: 

(a) connectivity bonds: they are the chemical bonds along the backbone or side 
chains. 

(b) sulfur bridges: these are covalent bonds which may form only between the 
sulfur atoms of the cysteine residues. 

2. non bonded 

There are several types of non-bonded interactions in proteins: 

(a) Coulomb: some atoms are assigned partial charges (smaller than one electronic 
charge) , and interact through Coulomb interaction (the question of the relative 
dielectric constant is a matter of debate). 

(b) Van der Waals: this interaction accounts for the strong steric repulsion at 
short distances, and the long range dipolar attraction at larger distances. It is 
usually represented by a Lennard-Jones 6-12 potential of the form: 

^(^ = ;4 - ^ (5) 

(c) Hydrogen bonds: the interaction responsible for the formation of H-bonds 
can be introduced explicitly by a 6-10 potential similar to the Lennard-Jones 
potential, but it is now quite accepted that H-bonds are just a result of the 
combination of Coulomb and Van der Waals interactions. 

3. The role of water: water is a dipolar molecule, and thus has strong interactions 
with charged or dipolar groups (see the above classification of the residues). Since 
proteins are active in an aqueous environment, water must be taken into account. 
This is the origin of the hydrophobic effect: hydrophobic groups will be buried 
inside the globule, whereas hydrophilic groups will be on the surface, in contact 
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with water. However, when one looks at the database of real proteins, it turns out 
that the situation is not so clear cut: there is a substantial probability (~ 35%) to 
find a hydrophobic residue on the surface of a protein, or to find a hydrophilic one 
buried inside. The situation is simpler for charged residues (including the ends of 
the chain which are ionized in water), which are almost always found on the outer 
surface of the protein. 

1.5 Energy scales 

The natural energy scale in protein chemistry is the kilo Calorie per mole, denoted 
kCal/mole. The correspondence with more physical units is 300K ~ 0.6 kCal/mole. 

The typical denaturation temperature for a protein is 1 kCal/mole. 

There are two widely separated energy scales involved: 

1. bonded interactions: their energy range is from 50 kCal/mole to 150 kCal/mole. 
They correspond typically to 100 at room temperature, and therefore, are not 
excited thermally. 

2. non bonded interactions: their energy range is from 1 to 5 kCal/mole. They are 
thermally excited at room temperature, and are thus responsible for the folding and 
all the observed thermodynamical properties of proteins. 

In a simplified description, it seems natural to consider all the bonded interactions as 
frozen (implying that the primary structure is quenched) , and take into account only the 
non bonded terms. 

1.6 Summary 

To summarize this section, we emphasize again that the hydrophobic interaction is a strong 
driving force for the collapse of proteins. In addition, there are competing interactions 
between the various amino acids (Coulomb, Van der Waals). This energetic frustration, 
together with the topological constraints induced by the chain, give rise to the existence 
of an exponential number of metastable states. The folded state is a compromise which 
minimizes the total free energy of the protein, subject to the chain constraint. In the 
following chapters, we shall present schematic heteropolymer models to describe some 
aspects of the physics of proteins. The questions we will more specifically address are: 

1. What is the nature of the transition: is it a liquid to crystal type of transition, or 
rather a glass transition? 

2. How can one understand the unicity of the folded state, given the extremely large 
number of metastable states. 

3. What are the mechanisms of folding? How can one describe the short and long time 
dynamics of proteins. 
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2 Heteropolymer models for proteins 



2. 1 Introduction 

As mentioned above, there are two main approaches to the (in vitro) protein folding 
transition. One may first view the folded state as an ordered microcrystal, with helix-like 
and sheet-like domains. This type of approach clearly relies on the universal character of 
secondary structures in proteins, and will not be considered here. 

In this review, we will be concerned with the tertiary structure, and rather emphasize 
the heterogeneity of the primary sequences of proteins. These non-random (evolution- 
selected) macromolecules will be modelled by disordered polymers of various sorts. In 
these models, the relevant interactions (monomer-monomer or monomer-solvent) are dis- 
ordered, and the disorder is quenched, to account for the fixed character of the chemical 
sequence {Ri, R2, ■■■Rn)- This hypothesis of a quenched disorder is not always obvious 
(there may be for instance an electrostatic charge on a monomer which depends on the 
monomer environment, hence an annealed character) but we will restrict our study to 
this case. This quenched disorder approach has clearly a lot in common with spin glass 
transitions. In particular it is tempting to consider the native state of an heteropolymer 
as the "almost unique" frozen state of the type present in spin glass mean field theories. 
A major difference though with spin glass theories is linked to the chain constraint, which, 
inter alia, induces long-range interactions along the primary sequence. 

The problems one faces in heteropolymers are therefore twofold, concerning both the 
"hetero" and the "polymer" aspects. As in spin glassesli3, it is not easy to have precise 
informations for a fixed disorder configuration (i.e. a single chain); one is often led to 
average over all possible disorder configurations (i.e. over all possible disordered chains). 
One then follows "well established" replica routes (mean-field, variational approach,....). 
It is a characteristic feature of spin glasses that this thermodynamic approach often leads 
to metastable states, and therefore raises questions on the dynamics of the system. 

The frequent inadequacy of the heteropolymer approach to deal with a fixed primary 
sequence makes the connexion with biology rather tenuous and does not help much in 
bringing biologists and physicists together. Furthermore, the thermodynamic limit that 
one considers in heteropolymer studies must be cautiously applied to a 100 monomers 
protein, not to mention the dynamical problems previously alluded to. If folding transi- 
tion is understood as "strongly cooperative folding process" , we nevertheless believe that 
such an approach is useful since it may connect the unusual folding transition with the 
unusual transitions of disordered condensed matter physics. We have adopted the follow- 
ing plan to disentangle as much as possible the difficulties linked to the two aspects of 
heteropolymers. Most of the basic polymer theory has been postponed to the Appendices. 
A brief summary of spin glass thermodynamics is presented in section |2.2| . Section |2.4| 
deals with various models of randomness that one may find in heteropolymers, and their 
physical interpretation. The random hydrophilic-hydrophobic chain will be considered in 



section 2.5. In section 2.6, we consider the "random bond" model which has a Random 



Energy Model type freezing transition in high dimensions. Other types of randomness are 



briefly considered in section |2.7| . The relevance of these ideas for real proteins is briefly 
examined in subsection |2.9| . Let us stress here the fact that the state of the art concerning 
the three dimensional heteropolymer folding problem is not even comparable to that of 
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spin glasses. 



2.2 A brief summary of spin glass theory 

A detailed state of this field is treated elsewhere in this book 0, so we will content 
ourselves with the minimum amount needed for the heteropolymer folding problem. The 
spin glass problem originally arose in the study of disordered magnetic alloys, where 
magnetic impurities (e.g. Mn), quenched at random positions in the host lattice (e.g. 
Cu) interact with one another through competing (oscillating) interactions. This model 
has been largely extended llBllil, and the interactions between the spins are accordingly 
of various sorts (exchange, dipolar, Dzialoshinsky-Morya,...). There is however a general 
consensus that the very essence of the spin glass problem (i.e. frustration + quenched 
disorder) is captured by the Edwards- Anderson Hamiltonian: 

n = -J2x,jSiSj (6) 

i<j 

where Si = ±1 is the (Ising) spin of impurity i at position fi and Jij = J{fi,fj) is 
the random coupling between impurities i and j, distributed with a law P{{Jij}). The 
frustration stems from the fact that the couplings {Jij} may take positive (ferromagnetic) 
or negative (antiferromagnetic) values. Furthermore, the position fj of the impurities (and 
therefore the Jij themselves) are quenched variables, which means that they evolve with 
a time scale infinitely longer than the time scales of the thermodynamic variables Si . 
This implies that for a given set of {Jij}, the free energy F{{Jij}) at temperature T is 
obtained as: 

F({J,,}) = -riogZ({J,,}) (7) 

with 

Zi{J,,}) = Tris^i exp (-/3H({ J.,})) (8) 

with P = (throughout this article, we set Boltzmann's constant ks = 1). Note that 
F{{Jij}) is also a random variable; for short range interactions, we now show that, when 
the number of spins becomes very large, F{{Jij}) is sharply peaked around its (disorder) 
averaged value F, where 

F = J Y[P(iJ^^}Mi'^^j})d{{J,,}) (9) 

This property is called self-averageness of the free energy, and the argument goes as follows 
111: suppose we divide a d-dimensional system of N spins Si into subsystems containing 
each m spins with 1 << m << A^. The system's total free energy is the sum of two 
contributions: 

(i) a contribution from each subsystem, of order m per "domain". 

ii) a contribution from each "domain wall" between neighboring subsystems, of order 

d-l 

m d for short-range interactions. 

For m large, the second contribution is negligeable compared to the first. Equation 
(^) results by considering each subsystem to represent a realization of the couplings {Jij}- 

11 



If / denotes a coherence length of the system, m can be thought of as the number of spins 
in a coherence volume l'^ (for a recent discussion, seeEj and references therein). 

It is fair to say that the most detailed study of spin glasses has been made with 
great difficulty in the mean-field limit (infinite range interactions). In this case, it can 
also be shown that the self-averageness of the free-energy holds in the thermodynamic 
limit. The original Edwards- Anderson Hamiltonian has been widely generalized (separable 
interactions, vectorial spins. Potts variables,...). Restricting the discussion to Ising spins, 
we consider a typical spin glass Hamiltonian to be given by: 



= - X! Jii-ip Si^--- S'ip (10) 

l<il<i2<---<ip<N 

where p > 2, and the couplings {Jii-ip} are independent random variables, distributed 
according to a Gaussian law: 



In (|Tl|), the A^-dependent normalization ensures that the free energy is extensive. 

Two main lines have been pursued in the (static) study of mean field spin glasseslllB: 

1) the replica method where one averages a priori over all disorder configurations, 
through the identity: 

_ W -1 

F = -Tlog Z = -T lim (12) 

n^O n 

2) the TAP equations, which give, for each disorder configuration {Jii—ip} the free 
energy local minima. These equations are difficult to study either analytically or numer- 
ically, except in certain limits. 

Both methods point towards a broad division of spin glass mean field models into two 
categories 

(i) usual spin glasses with a continuous Parisi replica symmetry breaking scheme 
(p = 2). The high temperature phase has only the paramagnetic solution, whereas the 
low temperature phase has an exponentially large number of TAP solutions, among which 
the low lying free energy solutions build up the Parisi replica order parameter. 

(ii) spin glasses with a one step replica symmetry breaking scheme {p > 3). These 
models have a low temperature phase rather similar to the previous ones, but possess two 
disordered phases above the static critical temperature: one with an exponential number 
of TAP metastable states, separated by high energy barriers, and the regular paramagnet 
with no metastability at all. These metastable states have of coursedynamical significance 
(that is why these models may be close to the real glass transitionEJ). Moreover, since we 
are dealing with mean field models, they have (in the thermodynamic limit) an infinite 
lifetime, and must therefore be included in the thermodynamics. 

The difference between these two classes may be traced back to their behavior in the 
absence of disorder: when pure, the latter models undergo first order transitions, implying 
the existence of a spinodal (dynamical) temperature above the critical temperature. 

It is important to note that the p — > oo model can be identified with Derrida's Random 
Energy Model (REM) 111, which appears as a generic model in the physics of mean field 
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disordered systems. For instance, if one rejilace the Ising spins of the Edwards- Anderson 
model by q-state Potts variables, one also 113 recovers the REM in the limit g — > oo. 

Reynn rj t he mean-field picture, one has basically to resort to Imry-Ma like domain 
arguments r^rH, to variational methods or to numerical calculations. In particular, the 
very existence of a spin glass phase in three dimensions, not to mention its nature, is still 
an unsettled question. Related models of interest include the random field Ising model, 
the role of impurities on the Abrikosov vortex lattice,..., and the heteropolymer folding 
transitions that we now present. 

2.3 Quenched disorder in polymers 

In the case of heteropolymers, we are interested in the statistical mechanics of a d dimen- 
sional polymer chain, with random quenched interactions (either with the solvent or with 
itself). The positions of monomer i, (i = 1,2, ...A^) is denoted by r^. The frustration in 
this case stems from confiicting terms in the Hamiltonian and from the geometric chain 
constraint g{fi,fi^i). Throughout this work, we will restrict ourselves to the simplest 
forms of chain constraint, namely 

(i) g^fijfi^i) = S^fi — ri+il — a) for discrete chains (a is the monomer length) 

(ii) g{fi,fi-^i) — > exp(— 2^ (^^^^) ) ^ continuous description of the chain (s de- 
noting the curvilinear abscissa along the chain). 

Other choices are discussed in appendix A. Furthermore, we will only consider ran- 
domness in the two body interactions Vij(fi,fj) or i's,s'(rs, r^/). Whenever necessary, we 
will also include the usual (homopolymer) many body interactions (e.g. wo{fi,fj,rk) 
for the three body term). Choosing the discrete notation, the partition function of the 
heteropolymer chain reads 

Z{{vij}) = J n^^"'*n5(^"'i'^m) expi-pn{{vij})) (13) 

i i 

where the (reduced) Hamiltonian is 

PT-idvij}) = ^'^Vij{ri,rj) + ^ XI wo{fi,fj,rk) + ... (14) 

and the dots ... include the possibility of higher order terms. Its free energy reads 

F{{vij]) = -T\o^Z{{vij]) (15) 

The self- aver ageness argument of the free energy may be presented in a slightly different 
way from the spin glass case. Consider a "soup" of M random chains of A^ monomers 
(a monomer should be thought of as an amino acid). Each of these chains represents a 
different choice of {wjj} , i.e. a different primary sequence. The free energy per chain of 
the soup is given by: 

1 M 
1 

This expression neglects the interchain interactions, and is thus a priori valid, provided 
that (i) the interactions in the soup are short-ranged (ii) the soup is dilute enough (one 
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chain problem), or concentrated enough (in a melt, interactions between the chains are 
screened). 

For large M, one may interpret F as an average over all possible choices of {vij}. 
Denoting this average by F, we have: 

^ = ^ = j n^(K})^(K})^(K}) (17) 

For a single chain (dilute soup problem), this point of view clearly leads to the use of 
replicas, in order to perform the quenched average. This would yield the typical properties 
of a typical chain of the soup. In a melt (concentrated soup problem), one may avoid the 
use of replicasii, by using equation (p^). On the contrary, if one wants to study a specific 
primary sequence, i.e. a given set of {vij} (without any averaging procedure), one must 
resort to the self-consistent field method, described in appendix B. Equations ( |191| ) are 
in some sense the TAP equations of our problem. Solving these equations would require 
an involved numerical treatment, which has not been undertaken so far. 

We now examine various possible choices of the two-body interaction term fij(rj,r^), 
with the (reasonable) restriction that we consider only translationally invariant forms. 

Vijin^Tj) = Vij{fi - fj) (18) 

One may first consider each monomer i to be characterized by a single random (scalar) 
"charge" ^j: the most general choice El then reads 

Vij in -Vj) = vo{fi -fj) + Pa{fi - fj){^i + + PKn - rj)^i^j (19) 

where vo{x),a{x) and b{x) are regular functions of x. For neutral homopolymers in a 
good solvent, vo{x) is usually taken as a short range repulsive function (excluded volume 
term). For polyelectrolytes, vq{x) is basically the Coulombic interaction. As for the 
random "charges" {^i}, we will consider them as independent random variables. Popular 
choices are the binary or Gaussian forms. 

More generally, one may consider that a monomer is characterized by z independent 
random "charges" , (a = 1,2, ..z) , linked to its electrostatic charge, its hydrophilicity, 

its helix forming tendency, Of course, the character of these "charges" can be chosen 

to be more complicated (vectorial, tensorial,...) but we shall stick to the simple scalar 
case. A very general model of heteropolymers may be then defined through a two body 
interaction: 

z 

Vijifi - fj) = J2 CaVfjin - fj) (20) 
a=l 

where vfj{fi — fj) is defined through equation ([l9|) 

v?j{r^ - Tj) = v^in - Tj) + Pa%n - tM? + e°) + Pb'^{r^ - rj)^ (21) 

and Cq is a real number. Of course, this general model is not very convenient to investigate, 
although it is clearly in the line of the Hopfield model of spin glasses. For instance 1^3, it 
is possible to show, that (i) if 6" is a zero range 6 function (ii) if a" = (iii) if one takes 
the limit z — > oo, then the disordered contribution in the two body interaction yields a 
random bond model that we consider below. To illustrate some physical points, we now 
present a few simple disordered models. 
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2-4 Some basic models of disordered polymers 



The random hydrophilic-hydrophobic chain 

In this model, one emphasizes the heterogeneity of the interactions between the amino 
acids and the water, i.e. the solvent. In the context of protein folding, it is widely believed 
that the interactions with the hydrophobic residues (Trp, He, Phe, ...) are the main driving 
forces for the collapse of proteins: hydrophobic residues in water can be thought of as 
being in a bad solvent, and thus have an attractive effective interaction. The monomers 
are described by a single "charge", .^j, and the monomer-solvent interactions are described 
by the Hamiltonian 

N N 

= - E E «(^^ - (22) 

i=l a=l 

where a denotes a short-ranged monomer-solvent molecule interaction, N and N denote 
respectively the number of monomers and of solvent molecules, and and Ra, their re- 
spective positions. Following appendix A, and assuming that the system is incompressible, 
we have 

N N N 

nms = +Y.T. -rJ)^^-AY,ii (23) 

j=l j=l i=l 



where A = X)r^('0- ^^^^ second term in p3| ) is a constant, equal to NAE,, and will 
therefore be omitted henceforth. We will write this hydrophilic-hydrophobic Hamiltonian 
in a more symmetric way 

. N N 

mi^ij}) = i;T.ll (^o(r-; - r,) + Pa{n - r,)te + 0)) (24) 

^ i=l 3=1 

which indeed looks like equation (^) with 6 = 0. Up to this point, we have not specified 
the probability distribution of the disorder variables {^i}. Striving for simplicity, we will 
write 

N 

pm) = n (25) 

i=l 

namely we assume that the {^j} are independent random variables drawn with the same 
probability law h. This problem will be studied in section 2.5. 



The random- bond chain 

This model play a central role in the mean-field theory of heteropolymers, since its freezing 
phase transition is described by the Random Energy Model. The two body interaction is 
given by 

Vijin -rj) = vo{fi -rj) + (3wij{fi - fj) (26) 

where the disorder contributions Wij are independent variables (they are defined for i < 
j). This means that two couples {i,j) and {i',j') of the same chemical nature, but at 
different positions of the primary sequence, are characterized by independent values of 
the interactions Wij{fi — fj) and Wi'j'{fi' — fji) This model therefore assumes a very 
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strong influence of the environment on the heteropolymer. As mentioned above, it can 
also be seen as a particular limiting case, where the number z of independent "charges" 
characterizing a monomer becomes very large. Again looking for simplicity, we will assume 
the probability distribution to be given by 

Af(Af-l)/2 

p{{^^i\)= n ^K) (27) 

fo-)=l 



where the condition (i < j) is tacitly assumed. This case will be studied in section 2.6. 



The "random sequence" chain 

In marked contrast to the previous case, the random sequence chain assumes that the 
monomers carry a single "charge" and that the environment plays no role at all. The 
two-body interaction reads: 

Vijifi - fj) = vo{fi - Tj) + I3h{ri - fj)CiCj (28) 

where vq again refers to excluded volume effects, and the second term describes the ran- 
dom interaction. This term has very different interpretations (and physical behavior) for 
6 > and b < 0. The former allows one to think of equation (|28| ) as describing a randomly 
(electrostatically) charged chain, or polyampholyte. The latter describes random copoly- 
mers, e.g. AB copolymers , where monomers of type A and B have a tendency to phase 
separate. If b is constant, and notwithstanding the (crucial) chain constraint, the spin 
analogs of these models are the antiferromagnetic (6 > 0) or ferromagnetic (6 < 0) Mattis 
models. This is why the copolymer case may illustrate how the low temperature phase 
may be linked with (or coded in) the primary sequence "charges" {^j}. These models are 
considered in section |2.7| , with the same assumption as in equation (p5|): 

N 

pm) = n Meo (29) 

i=l 

that is of the independence of the random variables {^i}. 

We have presented very simplified models of disordered self- interacting polymers, 
which may have some relevance to the physics of protein folding. It is clear that many 
other polymer problems are relevant to the physics of biopolymers. To name but a few, 
let us mention 

(i) charged polymers subject to an electric field, in relation to el ec troph oresi s jl 

(ii) polymers at interfaces, in relation to membranes or adsorptionE§EZI 

(iii) polymers in random mediae^. 

We now turn to a more detailed study of three "basic" heteropolymer models. 



2.5 The random hydrophilic-hydrophobic chain 

The Hamiltonian for this case has been derived above, and reads 
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In equation (|30|), we have assumed all interactions to be short ranged, andwe have also 
include three and four body terms for reasons that will soon become clear cj. 
Following section p.3|, the partition function reads: 



Z{m) = J l[dril[giri,ri+i) exp(-/3W(te})) (31) 

i i 

leading to a disorder averaged free energy: 

F = -T JUd^^ ^(^0 log ^({?^}) (32) 
Our calculations will be performed with the particular choice: 

"K-'-^-K"^) '''' 

With our conventions, a positive corresponds to a majority of hydrophilic links. 
Rewriting (pO|) and (]3T|) in a continuous form, we have: 



with 

rN rN 



1 nlM nlV 

mms)}) = TT / ds ds' [vo + mis) + i{s'))) 5{r{s) - r{s')) 
^ Jo Jo 

+— f ds f ds' f ds" 6{r{s) - f{s')) 5{r{s') - r{s")) 



6 

vn 

+—1 ds ds' / ds" / ds'" 6{r{s) - r{s')) 5{r{s') - r{s")) 5{r{s") - r{s"')) (35) 
24 Jo Jo Jo Jo 

In the following, we will assume that the origin of the chain is fixed at and that the 
extremity is free. 

We first consider the replica route, namely we consider typical properties of a typical 
chain of the dilute "soup" of the previous section. Introducing replica indices a,b,.. we 
may perform the quenched average in ( |3^ and get 

(/32t2 /.N rN rN \ 

dsj^ ds' ds" '£5{ra{s)-ra{s))5{n{s)-n{s'))\ (36) 
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with 



-{vo + 2/3^0) / ds / ds' 5{ra{s) - fa{s')) 

+^ / ds ds' / ds" Y,5{ra{s)-ra{s')) 5{Ta{s')-ra{s")) 
6 Jo Jo Jo „ 



a 

+— / ds ds' ds" / ds" 
24 Jo Jo Jo Jo 

E ^i^ais) - ra{s')) 6{fa{s') " Tais")) 6ifa{s") " rais'")) (37) 
a 

At this stage, there are two possibiUties. The first one, due to Edwards and Muthuku- 
mar Ea, is a variational approach to the replicated Hamiltonian of equations (36,37) 
through the trial Hamiltonian 



mo = ^ ds ds' E [ra{s)9-l{s - s')n{s')) (38) 

a.o 



This method is an extension of Des Cloizeaux calculation i for a swollen polymer. The 
variational parameter(s) is the nx n matrix ga,b{s — s'). This approach has the advantage 
that it does not rely on the "ground state dominance" approximation (see appendix B 
and below), but it is technically rather heavy. Beside the original workcS, which delt 
with a polymer chain in a random medium, the only use of this method, in the context of 
self-interacting random chains, is, as far as we know, the hydrophilic-hydrophobic chainl23. 
This study deals with the three-dimensional case, in the framework of a one step replica 
symmetry breaking scheme for the Parisi-like kernel g{s,x), x £ [0,1]. We have chosen 
to present first a different approach, more mean-field in character, and then to compare 
our results with this variational method. 

The alternative route we have followed uses order parameters, namely we introduce 
the overlap gafe(^)^) ) with a < b, and the density Pair) by: 

rN 

Qabiry) = / ds 5{fa{s) - f) 5{n{s) - r') 
Jo 

rN 

Pair) = / ds6{ra{s)-r) (39) 
Jo 

we may write equations (^,37) as: 

2"" = j Vqab{r,r)Vqab{r,r)Vpa{r)V(i)a{r) exp {G{qab, qab, Pa, (l^a) + log C(^afe, 0a)) (40) 
where qab{r,r') and (t)a{r) are the Lagrange multipliers associated with (|^), and: 

G{qab, qab, Pa, <t^a) = J d^v^^ (^^Pa{r)Uf^ " (^^0 + 2/3^0) ^ " ^P^(r) " |p^.(r)^ 

+ ld''rl d''r'J2{iqab{r,f)qabir,f)+P^e qabir,f)pair)Pbi^))m 

a<b 

18 



and 



X exp i-i J dsY^ qabirais),n{s)) j (42) 



In equation (pl), we have defined: 

w'o = wo- 3/?2 ^2 (43) 
According to appendix B, we can rewrite: 

C{qab,^a) = jY[d''ra<n...rn\e-''''-^'^'^'^''^\0...0> (44) 



where the Hn is a "quantum-hke" n — > Hamiltonian, given by: 

2d 



So far our treatment has been rigorous. Anticipating some kind of (hydrophobically- 
driven) collapse, we assume that we can use ground-state dominance to evaluate ( p^ , p5D , 
and write, omitting some non-extensive prefactors (see appendix B): 

~ exp I -N min {< > -Eo (< '^{'^ > -1)} ) (46) 

where Eq is the ground state energy of Hn- At this point, the problem is still untractable, 
and we make the extra approximation of saddle-point method (SPM). The extremization 
with respect to Qab reads: 

iqabir, /) = -p^ fpa{r) pb{r) (47) 

This equation shows that replica symmetry is not broken (at least at the saddle point 
level), since qab is a product of two single-replica quantities pa- To get more analytic 
information, we follow appendix B, and use the Rayleigh-Ritz variational principle. Due 
to the absence of replica symmetry breaking (RSB), we further restrict the variational 
wave-function space to Hartree-like replica-symmetric wave- functions, and write: 

n 

■^{n,---.rn) = X{^{ra) (48) 

a=l 

Because of replica symmetry, we can omit replica indices and take easily the n — > limit. 
The variational free energy now reads: 

- /?F(g, q, p, ct>,^)=J d^r (^ipimr) - {vo + 2(3^o) ^ " ^P=^(r) - ^p\^^ 
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(Tr (^(f) [-1^^ + i Hr) j ^(r) '2]^^ J d'^r'q{fy)ip\r)ip\f^)'j 

+NEo (^J cf^rif^ir) - 1^ (49) 

The SPM equations read: 

p[r) = Nip'^{f) 
q{f,r) = N(f'^{-r)(f'^{r) 

ict>{r) = {vo + 2(iio)p{r) + ^p^r) + ^-^P^r) + ^'t J d''r'q{r, f')p{r^) 
iq{f,f^) = -j3'^^^p{f)p{r') (50) 

We still have to minimize with respect to the normalized wave-function (^(r). This leads 
to a very complicated non-linear Schrodinger equation, and we shall restrict ourselves to 
a one-parameter family of Gaussian wave-functions of the form: 

where R is the only variational parameter. 



Using equations (flSf ) and (49), the variational free energy per monomer (/ = F/N) 
becomes: 

2 , {V0 + 2P eo) 



Using some Gaussian integrals, and the value of w'q given in (^), the free energy reads: 
nl = 1 {vo + 2p^o) N 

l,(27r^/3)'^ 6 (27r)rf 2 ^"^ ^ > ) ^ R<^^ 

(53) 



1 yo 
(327r3)y 24 \R'^) 

At low temperatures, one has to study the sign of the third term of (p3|). The repulsive four 
body term is necessary to yield a stable theory at low temperatures, when the coefficient of 

(y^^d^ changes sign, due to disorder fluctuations. A detailed studyS yields the following 
results: 

(i) ^0 > : the hydrophilic case. 
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In our approximation, we find that there is a first order transition towards a cohapsed 
phase {R N^/'^) induced by a negative three-body term (and stabilized by a positive 
four-body term). This transition is neither an ordinary 6 point (since the two-body term 
is positive), nor a freezing point (the replica-symmetrv is not broken). Such an effect 
has been previously found in random polyelectrolytes u. Note that since our approach 
is variational, the true free energy of the system is lower than the variational one, and 
we thus expect the real transition to occur at an even higher temperature. Since the 
transition is first order, we expect metastability and retardation effects to be important 
near the transition; we also expect that (due to the latent heat), there will be a reduction 
of entropy in the low temperature phase, compared to an ordinary second order 9 
Furthermore, the three body induced collapse has some non trivial geometry built-inE 

(ii) ^0 < : the hydrophobic case. 

In this case, defining the two temperatures: 

To = mo\/vo 

Ti = e—, ^= (54) 

yields two possible scenarii: 

(iia) T\ > Tq : the collapse transition is again driven by the disorder fluctuations 
of the three-body interactions. The resulting first-order transition is very similar to the 
hydrophilic case (i). 

(iib) Tq> T\ : the collapse transition is now driven by the strong two-body .^o term. 
The resulting phase transition is very similar to an ordinary point, and is therefore 
second-order. At low temperature, the collapsed phase undergoes another (first order) 
phase transition due to the negative three body term. 

We therefore find that the random hydrophilic-hydrophobic chain has two compact 
phases, separated by a first order phase transition. One may also note that it has two 
swollen "phases", one without metastable states (above the regime), and one with 
metastable states (above the discontinuous collapse regime). Note that, in a similar 
vein, one can also find an extra collapsed regime (between the two collapsed phases) dis- 
playing metastability effects. These results were obtained by a saddle point approach 
supplemented by a ground state dominance approximation. The validity of these approx- 
imations is discussed in appendix B, and we only mention here that this approach is not 
really appropriate for the description of metastable states. Lastly, it is of interest to note 
that annealed random chains have a very similar phase diagram. 

At this point, it is interesting to compare our results, which have essentially a mean 
field dignity, with the replica variational approach ofS in d = 3. This work finds a phase 
diagram in broad agreement with ours, except that the first order transitions become 
continuous " one step " freezing transitions. One therefore gets in this approach the two 
swollen phases mentioned above. The fact that a replica symmetric mean field theory 
turns, (for finite dimensions) into a broken replica symmetry theory occurs also in the 
random field Ising modelEj. Interestingly enough, the random term in equation (|30| ) can 
be rewritten in a "random density" way since 

Y.^^m-ri)=Y.i^p{n) (55) 
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which, in some sense suggests that this random chain can also be viewed as an Imry-Ma 
system. The Imry-Ma domain arguments depend on non-extensive (free) energies and are 
not easily interpreted in the framework of replica theory. It would be much nicer to work 
with a fixed {^j} distribution: in this respect, one may think of a disorder dependent 
variational method, and/or a domain size analysis. 

Finally, we close this section by some protein related comments. One may say that 
"good" {^i} sequences (i.e. easily foldable) should not be trapped in a metastable state, 
and should yield stable geometrical shapes. This means that a good sequence should 
enter neither of the two swollen phases, since one has potentially trapping metastable 
states, and the other leads to a ^ collapsed phase which has no definite shape (extensive 
conformational entropy). In this picture, a good sequence should be, in some appropriate 
phase diagram, on the dividing edge between the two swollen phases: the folding transition 
would then be a multicritical point, which seems reasonablefrom a physical standpoint. 
This remark is probably related to some dynamical criteria that may characterize 

good folders in various heteropolymer models. 

2.6 The random bond chain. 

There are various ways to treat this difficult case US, and we will present our point of 
view along two lines. We will first consider the very high dimensional approach, where 
the chain constraint is irrelevant. In this case, one may show directly that a collapsed 
phase undergoes a Random Energy Model (REM) freezing transition. One may also 
follow the same path as for the hydrophilic-hydrophobic chain, namely to use ground 
state dominance plus some saddle point approximation. The main difference here is that 
one is faced with a broken replica symmetry saddle point so that the variational wave 
function is not replica symmetric. 

Very high dimension approach. 

Let us consider the Hamiltonian: 



where the couplings {wij} are random independent couplings, and vq represents the overall 
effect of the solvent, as well as the direct non random pair interactions. For the sake of 
simplicity, we use a Gaussian probability distribution for the couplings: 



1 1 
pn{{wij}) = -"^{vo + Pwij)6{ri - fj) + - wo6{fi - fj)6{fi - fk) (56) 




(57) 



The partition function is: 




(58) 



where the function g{fi,fi^i) again enforces the chain constraint. 
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Replicating and averaging (55) yields: 



o- i,a \ i<j a a=^b i<j 



with vo = vo-P^'^ 



"1 

2 • 

The above replicated Hamiltonian has three characteristics: 

(i) the chain constraint term. 

(ii) a possible 6 point if vq < 0. 

(iii) a possible freezing transition due to the a b term of (|59|). 

Since we wish to emphasize here the freezing transition, we will assume that vq is 
indeed negative, so that the system is in the collapsed phase. To get an easily tractable 
model, we further consider a simple but unrealistic geometry, namely a collapsed chain 
on a fully connected lattice. On such a lattice, by definition, each point is a neighbor to 
all the other points of the lattice. Examples are provided by a triangle ( 3 points, two 
dimensions), a tetrahedron ( four points, three dimensions),... so that for a large number 
O of points, it is a high dimensional (0 — 1) polyhedron. On such a lattice, the chain 
constraint i) is automatically satisfied, since a site has Q — 1 neighbors, so that the chain 
constraint is a effect. 

Let us show that in the collapsed phase, the model is equivalent to a Random Energy 
Model (REM). For that purpose, we will show that the energies of this system are random 
independent Gaussian variables. First, we note that in the collapsed phase, the monomer 
density p is finite and constant in space; only a number N/p of sites are occupied. This 
implies that the only conformation dependent term of ( ^6[ ) is the random two-body term 
J2i<j Wij5{fi — fj). Since this term is a linear combination of Gaussian variables, it is also 
a Gaussian variable, and thus, its distribution is entirely characterized by its correlation 
functions. Therefore, instead of computing the joint probability P{Ei, E2) for two copies 
of the chain, we will calculate directly the correlation E1E2 between the energies Ei and 
E2 of two conformations {r^^^} and {?^^^} of the chain in the collapsed phase. We have: 

W= ^E„-5(rf^-rf)) 
= ^E.^/^f(r) 

= ^Np (60) 

where pi{r) = Yl,i 5{r—ri^^). Therefore, El is independent of the (collapsed) conformation. 
Similarly, we have: 

'^1) ^l)^x^^2) 42). 



= i'Er,r' qh{r,f') (61) 
where qi2ir,r') = ^^^) ^{^ ~ '^^^)- We have: 



E 



qi2{r,r)=N (62) 
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and since all occupied points are equivalent (constant density), we have: 



which in turn implies: 



9i2(r,r') = — (63) 



w"^ 2 



E1E2 = —p' (64) 



from which we get the joint probability: 

lim P{Ei,E2) ~ exp I ' ^ ) (65) 



leading to a REM behaviour. This behaviour, in the context of protein fhlding, was first 
put forward, on phenomenological grounds, by Bryngelson and WolynesE^. 

It is of course possible to obtain these results using replicas. In this case, one shows 
that the chain model as we have stated it, is equivalent to an infinite-range Potts-glass, 
with N/ p 00 states. This model can then be solved by a one-step replica symmetry 
breaking scheme. The physics of the REM is discussed elsewhere in this book: the chain 
undergoes a freezing transition at a temperature Tc- Above Tc, the system has a finite 
entropy, whereas below Tc, it vanishes. The system is then frozen in a small number of 
dominant states, determined by subtle non extensive effects. This is why this model is so 
appealing for protein folding. 

High dimension approach. 

To go beyond the mean field REM freezing behaviour, one has to take into account the 
chain constraint g{rf,rf_^i) in (|5^). We have 



a i,a y i<j o. a^h i<j 



that we now rewrite as 



with 



/ fl2 2 .N j-N \ 

X exp ^ ds ds' J2 ^(^a{s) - ra{s')) 6{n{s) - rt{s')) (67) 



1 i-YV i-N 

A= -{vo) ds ds' Y.6ira{s)-ra{s')) 

2 Jo Jo 

inn 

+^ / ds ds' ds" J26{ra{s)-ra{s'))6{fa{s')-fa{s")) (68) 
6 Jo Jo Jo n 
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Defining, as in equation (^^, the parameters q'afe(^)^) ! with a < b, and Pa{^, and 
introducing the associated Lagrange multipliers qah{^-,^) and (pai^), one gets: 

= y Vqabif, r)Vqab{r, r)Vpa{r)V(t)a{r) exp {G{qab-, qab, Pa, 4>a) + log ({qab, (pa)) (69) 



with 



G{qab,qab,Pa,<Pa) = jd^vY, [iPa{f^Ur) " (^o) ^ " ^Pa(o) 

+ y d'^r 1 dV^ Uab{r,^)qab{r,^) + ^ qlbir,f')) (70) 

a<b \ / 



and 



X exp(-i / ds y2M^a{s))-i [ ds V ga6(^a(s), n,(s)) | (71) 



To go further, one follows the same approximations as in section ( |2.q ). 

(i) one assumes ground state dominance in the "quantum" Hamiltonian associated 
with equation (71). 



(ii) the free energy is calculated by the SPM. 

In view of the high dimension results, point (i) is natural, since one expects first a 9 
collapse transition, followed at low temperature by a freezing transition driven by the off- 
diagonal (in replica space) terms. The procedure exactly parallels the one of the randomly 
hydrophilic-hydrophobic chain of the preceding section, except for the variational wave 
function that enters the Rayleigh-Ritz principle. The replica symmetric form extracted 
from equation ( p7[) is not valid anymore: the variational wave function should present 
replica symmetry breaking. A simple form was proposed by Shakhnovich and Gutinc2l, 
and reads: 

^(^"i' = 7^;^ (^-\ E ^^aKabr)j (72) 

where X is a n x n Parisi-like hierarchical matrix (see 0), and d is the dimension of 
space. The variational free energy is extremized with respect to K. The result, for large 
enough d, is a step function form for K{x), {x £ [0, 1]), corresponding to a REM type of 
replica symmetry breaking. We briefly recall the physical meaning of a x-dependent length 
scale (see equation (^). The overlap parameter of equation ( [39| ) can be understood by 
considering two real chains fi{s) and r2(s), with the same disorder configuration {vij} 
coupled through an infinitesimal term of the form 

Hu = e ds S{ri{s) - Us)) (73) 
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It can be shown, as in spin glasses, that the Parisi order parameter q(x) = hm^^o Qabi^^ 
is identical to the average qu =< S{fi{s) — ^2(5)) > taken over the two (real) chains 
Hamiltonian. Small x corresponds to the average overlap over large distance (of the order 
of the global radius of gyration), whereas a; ~ 1 corresponds to an average overlap over 
a microscopic distance (of the order of a bond length). In this picture, the x dependence 
stems from the existence of many different local minima in the two chain system. These 
minima can be probed with an infinitesimal field e. 

It is unclear to us whether this analysis applies for d = 3, but a variational wave 



function of the form (72) has been widely used in other disordered condensed matter 
situations 0: vortex lattice with impurities in superconductors, interface in a random 
potential, amorphous solidification of vulcanized macromolecules,... 

The previous approaches to the random bond chain study a freezing transition in the 
collapsed regime. Using the Edwards-Muthukumar approaches, it should be possible to 
study also the direct transition swollen phase frozen phase. This transition is seen in 
low dimensional simulations, yielding a phase diagram in large agreement with the one of 
the preceding section. For essentially the same reasons as in section one may define 



"good folders" as heteropolymers undergoing the folding transition at a multicritical point. 
2.7 The "random sequence" chain 

We now turn to the last case, namely the "random sequence" chain As mentioned in 
section this model describes two very different physical realities, namely randomly 
charged (globally neutral) polymers (also called polyampholytes) and random AB copoly- 
mers. The random variables {^i} of equation (|28|) will be assumed to be independent. We 
first present a general strategy for the two cases along the lines of section \2.5\ 



The "standard" approach 

Consider a "random sequence" chain described by a two body term: interaction: 

Vijiu - fj) = voin - fj) + el3b{fi - rj)iiij (74) 

where the probability distribution /i(^) is given by equation (^), with ^0 = Oi and e = 1 
(resp. e = — 1) applies to the polyampholyte (resp. copolymer) case. Following the same 
route as above, we get: 



Z" = j Vqab{r,T^)Vqab{r,r)Vpa{r)V4)a{r)V^a{r) exp (G(5a6, ^af,, p^, ^a) + log C(ga6, </<«)) (75) 
where qabi^^^) and 0a(^) are again the Lagrange multipliers associated with (^), and: 

G{qabAab,Pa.<t>a.^a) = J d^vY^ (^^Pa{f1Ur) " ^0 ^ " ^P^(r) " ^P-i^^^^^ 

+ jd''j d''r'Y,(iqab{r,f)qab{r,f') - e ^a(r)ga6(r, r-')^6(r")) (76) 



a<b 
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and 



/d f" -2 
WVra{s) exp((-^ ds -ij^ ds (t>a{ra{s)) 

exp{-i / dsV] qab{ra{s),fbis)))) (77) 
-'0 TTi 



Clearly, one may run through the same analysis as was done before (saddle point 
method plus ground state dominance approximation). It is easy to see that the SPM 
yields a replica symmetric solution, for the same reasons as in section 2.5. For instance, 
the saddle point equation for ^'a(^) reads 

d''r'b-\r-f^)^a{f')-f ^Pa{ma{f^+ I d''r'J2qab{ry)Mf')j =0 (78) 

At this stage, two different cases have to be considered. For short range forces, ((a) random 
AB copolymer chain, (b) polyampholyte chain with salt), the saddle point equations yield 
either a macroscopic phase separation (case (a)) or a macroscopic charge crystallization 
(case (b)). This clearly shows that in this case, the SPM is a rather poor approximation 
and does not treat properly the chain constraint. For case (a) above, it can easily be shown 
that a macroscopic phase separation would only occur in the irrealistic fully connected 
geometry of section ^.6| . 



Beyond the saddle point method: short range interactions 

The solution to this problem has been found by Leibler@ in the case of a melt of (non- 
random) AB block copolymers: when one goes beyond the SPM, there appears a new 
length scale I* ~ \/iV, which is the spatial scale for phase separation between A-rich and B- 
rich regions {N is the length of one copolymer chain). In other words, the order parameter 
Fourier component ^aik) has critical fluctuations, for all wave vectors on a snliere of 
radius \k*\ = Due to this continuous symmetry, it was shown by Brazovskiia that, 
below d = 6, thermodynamic fluctuations turn this transition into a first order transition. 
Depending upon the content of A's and B's in the chains, one expects various types 
of modulated ordered phases (lamellar, hexagonal, body centered cubic,...) for the order 
parameter ^a(^)- The random AB melt presents the same type of modulated order, added 
to a possible replica symmetry breaking phenomenon induced by the order parameter 
Qabi^j^) of equation (75). A similar situation arises for the short range polyampholyte 
melt. The discussion of the "random sequence" melt is therefore quite complicated, due 
to the interplay of a non zero wave vector ordering and of disorder. Since the techniques 
and results are specificahvlinked to the notion of a polymeric melt, we refer the interested 
ready to the literature where phase diagrams involving modulated and/or frozen 

structures have been oroposed. Lastly, the dynamics of this model may help to bridge the 
gap with real glasses e3, since the peculiar features of Brazovskii's phase transition lead to 
a large number of symmetry unrelated metastable phases above the critical temperature. 

As a temporary conclusion to the disordered short range interactions case, we wish to 
make some remarks which pertain to the single chain problem. 
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(1) for a melt, the existence of the wave vector k* can be estabhshed either through 
rephca calculations or directly (see equation (p^)): we see no reason why the associated 
length scale, in the single chain problem, should not be distributed, as is frequently found 
in domain arguments. 

for screened Coulomb interaction (polyampholyte with salt), numerical evidence 
supports the existence of a 9 transition for a neutral chain in d = 3. As far as 
we know, no evidence exists for a modulated or frozen phase at very low temperatures. 
Finally we mention that a one dimensional version of a directed chain shows a very non 
trivial freezing transition EHI. 



Beyond the saddle point method: long range interactions 

For long ranged interactions (polyampholytes without salt), the saddle point equation 
(^) is indeed reminiscent of the Poisson-Boltzmann equation. Again the chain constraint 
would be poorly treated at this stage, and one would have to go beyond the SPM as for 
the short range case. Moreover, it is not clear that the free energy of the chain is self 
averaging in the presence of Coul p r n b forces. We will therefore appeal to (more classical) 
electrostatic arguments ( seeEi30E3E3 and references therein). 

Consider a soup of neutral polyampholyte chains of N monomers, each of length /. For 
each chain, monomer i is given a charge = ±go, with probability h{+) = h{—) = 1/2 
(of course one has h{x) = ^^^d~2 )■ 

For reasons that will soon become clear, one has to distinguish two different situations, 
both of experimental interest: one may either consider 

(i) an ensemble of random chains where the neutrality constraint holds separately for 
each chain or 

(ii) an ensemble of random chains which are globally neutral, implying that each chain 
has, up to a random sign, a typical charge Q ~ qq^/N. 

In the f orrn er case, the low temperature phase is collapsed, since, as shown by Higgs 
and JoannyS, one then gains an electrostatic condensation energy of order —Nqq/I'^"'^ . 
The transition is very similar to an ordinary 6 point with renormalized excluded volume. 
Furthermore there are some numerical evidence!^ that there may exist, at still lower 
temperature, a freezing transition of the REM type. 

In the latter case, the excess charge Q may lead to a swelling of the chains. Its 
associated electrostatic energy Eq is of order Q^/i?*^"^, where R is the radius of gyration. 

This suggests that above d = 4, this electrostatic contribution may be irrelevant since 
we then have R ~ N^/'^, implying Eq ~ A^(*^^'^)/2. One then expects a situation very 
similar to case (i) above. Below d = 4, the two cases are different. The role of the 
Coulomb interaction in d = 3 is not yet settled, although there are some evidence that 
the behaviour of the chain is controlled by a parameter (a = Q/Qr), where Qn is the 
Rayleigh charge well known in the instability of a spherical charged droplet. For a small, 
the polyampholyte chain is collapsed at low temperature, whereas it is swollen for a large, 
possibly into an elongated necklace of collapsed beads. Again, a freezing transition is still 
possible at lower temperature. 
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2.8 Conclusions 



We have presented some of the problems Unked with the "random sequence" melt, where 
modulated and/or frozen phases may appear, that one may interpret in term of (more or 
less) finite range phase separation or charge condensation. These phases show up beyond 
the saddle point approximation. The long range character of Coulomb interactions may be 
a further difficulty. The discreteness of the chain can also be the source of complications 
(commensurability effects,...). The relevance of these results for a single chain (and may 
be for protein folding) is unclear, to say the least. It is tempting to assume that similar 
considerations apply once the chain has undergone a collapse 6 transition, but we believe 
that new theoretical methods must be found for the one chain problem. On the protein 
side, only five (His, Lys, Arg, Asp, Glu) among twenty of the amino adds are charged 
and their position along the chemical sequence are somehow correlated E3. Furthermore, 
the rather small size of realistic proteins probably make any freezing "transition" a non 
REM "transition" ii. 

2.9 Heteropolymers and proteins 

A general feature of the above heteropolymer models is the existence of several compact 
phases, with widely different entropies (and other characteristics as well). We have also 
pointed out that one may have different coil phases, with widely different dynamical 
behaviour. The theoretical methods we have presented can certainly be criticized, and we 
recall some of their weak points: 

(i) the results are obtained in the limit ^ oo. 

(ii) we have basically used the SPM supplemented by a ground state dominance ap- 
proximation. The need to go beyond the SPM is clear in the "random sequence" melt, 
but the single chain problem seems presently out of reach. Furthermore, the ground state 
approximation is not appropriate to describe the coil globule transition. 

(iii) we have made calculations over disorder averaged quantities, instead of consider- 
ing a fixed disorder configuration. For a single chain, the difference can be important, and 
an approach along the lines of appendix B should be interesting. Furthermore, we have 
assumed, for simplicity, that the disorder variables where uncorrelated, which is certainly 
not realistic. It is therefore of interest to mention a recent extension @ of the REM to 
take energy correlations into account. 

As for proteins, there is a g eneral agreement about their being non-random, either from 
a hydrophobic point of viewH, or from a Coulombic p oin t of viewll. This non randomness 
has been also emphasized in a dynamical context E3c§eZI. Nevertheless, the comparison 
between heteropolymers and proteins has proven ajiseful idea: one may mention, among 
other features, the existence of a molten globule t3 as another distinct compact phase 
of proteins, the successful interpretation ofsomefthermo) dynamical folding experiments 
by REM or other glassy models (see e.g. e1I'E3E3E3), or the tentative desig n of s implified 

folding potentials, of fast folding sequences (for recent references see e.g. r^r^) We 

emphasize once more that, from the heteropolymer point of view, a "good" folder should 
follow, in some appropriate phase diagram, a dividing edge from the coil state onto a 
folded state, through a multicritical folding transition. 
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3 Dynamics of proteins 

In this section, we shah review some theories of protein dynamics. Roughly speaking, one 
can distinguish two main approaches to this problem. 

The first approach, of a microscopic nature, is based on the dynamics of the polymer 
models discussed in the previous sections. It thus encompasses the topological frustration 
induced by the chain constraint, and the steric hindrance. Considering the difficulties of 
the thermodynamics, this approach is still at a very preliminary stage, especially in the 
collapsed phase, where entanglements and topological frustration effects are dominant. To 
further emphasize the difficulty of the problem, let us mention that there is, at present, 
no satisfactory theory for the dynamics of the collapse of a homopolymer chain. As one 
can imagine, the problem is orders of magnitude harder for heteropolymer chains. 

The second approach, of a rather phenomenological nature, relies on the strong sim- 
ilarities between the protein folding problem and the random energy model (REM), and 
reduces somehow to variants of the dynamics of the REM. This approach concentrates on 
the roughness of the energy landscape due to energetic frustration. 

In a first section, following the microscopic route we study the dynamics of collapse of 
a homopolymer chain. This may be useful to describe the first stages of the hydrophobic 
collapse of a protein, since it does not crucially depend on the specific nature of the inter- 
actions. We also briefly discuss various attempts towards a description of heteropolymer 
dynamics along these lines. 

Finally, we present some dynamical approaches to the REM, as an oversimplified 
protein folding scheme. 



3.1 Dynamics of the collapse of a polymer chain 

In this section, we study the dynamics of protein folding. At the very early stages of pro- 
tein folding, the driving force is believed to be the hydrophobic force; it is thus reasonable, 
in a first approximation, to neglect the amino acid sequence, and model the system as a 
homopolymer in a bad solvent. 

According to de Gennes' theoryiBH the collapse of a fiexible coil leads to the formation 
of "pearls" on a minimal scale along the linear chain, which thickens and shortens under 
diffusion of the monomers, then forms new pearls at a larger scale, until the final state 
of a compact globule is reached; the longest timescale for the collapse, neglecting knot 
formation, is estimated as 



ksO VI AT 

where ij is the viscosity of the solvent, 9 is the temperature of the 6 point , a is the 
monomer size and AT is the temperature quench from the 9 temperature. This time 
has a strong dependence on molecular weight. For the case of proteins (see reference!^), 
N = 300, this yields a collapse time of Tc ~ 1 s for a temperature quench of = 0.01. 

In a series of articles, Timoshenko et al. have developed an alternative theory based 
on a self-consistent method using Langevin equations that can be analyzed numerically; 
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kinetics laws for the collapse of a homopolymer are obtained with or without hydrody- 
namic interactions, at early and later stages. In a recent article, a generalization of this 
method has been applied to the dynamics of a hydrophilic-hydrophobic heteropolymer. 

In the following, we shall present an analytical method to study the kinetics of a 
homopolymer in a solvent when it is quenched into bad solvent conditions (collapse into 
a globule) llil. 

Presentation of the method 

We consider a homopolymer chain in conditions - i.e a Gaussian coil- consisting of N 
monomers, obeying the Langevin dynamics as the chain is quenched into good or bad 
solvent conditions (equations (^) and ( |80[ ) ). 

Let's first neglect all hydrodynamic interactions. To keep the notations as simple as 
possible, we will omit the arrows on the vectors. The equations of motion for the system 
read: 



dr dH 

where N is the total number of monomers, r(s, t) is the position of monomer s in the 
chain, oq is the monomer length and C = is the friction coefficient, D is the diffusion 
constant of a monomer in the solvent and ksT is the temperature. The intra-molecular as 
well as intermolecular interactions of the chain are contained in the potential V{r(s,t)). 
The thermal noise r/(s, t) is a Gaussian noise with zero mean and correlation given by: 

< r/(s, t)r]{s',t') >= 2CkBT5{s - s') 5{t - t') 

The method consists in finding a virtual homopolymer chain which obeys a simpler 
Langevin equation, chosen so that its radius of gyration best approaches the radius of 
gyration of the real chain at each time t. 

The virtual chain, defined by A^\s,t) satisfies the Langevin equation: 




^^"^ +V{s,t) (81) 

2 

I ds (82) 

with the same friction coefficient and noise as the original equation, but with a simpler 
Hamiltonian H^. Indeed this Hamiltonian represents a Gaussian chain, but with a 
time dependent Kuhn length a{t). 

Our method is a generalization of Edwards' uniform expansion modelEl to dynamics. 
This method consists in calculating the radius of gyration of a polymer by using perturba- 
tion theory, and adjusting the simplified Hamiltonian so that the first order perturbation 
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to the radius of evration vanishes. If v denotes the excluded volume, the method gives 
the Flory radius O for large and agrees with the result of the first-order perturbation 
expansion for small v. Note that it would seem natural to use the most general quadratic 
Hamiltonian rather than that of (^) , but this was shown by des CloizeauxLJ to yield 
the incorrect exponent v = 2/d. 
Let's define 

X{s,t)=r{s,t)-r^''\s,t) 
W = H-H^ 

Assuming that ( ^T[) is a good approximation to (^) , x{s, t) and W can be regarded as 
small, and to first order in these quantities, the dynamical equations become: 



Br'-''') 2>kBT d'^r^"^ , , 

where F{r{s,t)) = — g^j^y is the driving force for the swelling or collapse of the chain. 

More precisely, in the following, for a chain in a bad solvent, we will take attractive 
two-body interactions and repulsive three-body interactions: 



V{r{s,t)) = -V2{r{s,t)) + Vsir{s,t)) 

V{r{s,t)) = — keT ds ds'6{r{s,t) - r{s' ,t)) 



y, pN pN pN 

+-kBT ds ds' ds"6{r{s,t) -r{s',t))6{r{s',t) -r{s",t)), 
6 Jo Jo Jo 

where v > and w > 0. 

In this approximation, the radius of gyration of the chain becomes: 



1 



N 



Rg <r\s,t)>ds (85) 

1 

"^NJo <((^^'^)'('^t) + 2r(^Hs,t)x{s,t))>ds (86) 

The brackets denote the thermal average ( that is an average over the Gaussian noise 
r]{s,t)). Our approximation consists in choosing the parameter a{t) in such a way that 
the first order in (86) vanishes: 

Jo 



<r^^\s,t)x{s,t)>=0 (87) 



or in Fourier coordinates: 

j2<f(:Ht)xut) >=o 

n^O 
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where the Fourier transform is given by: 



rnit) = w lo e''^"'ris,t)ds 



and similarly for xis,t)- 

We have used periodic boundary conditions, so that iOn = In addition, to get rid 
of the center of mass diffusion, we constrain the center of mass of the system to remain 

at fixed position, ro(*) = ''^o'''*(*) ~ Xo(0 = 0- 

Equations (|83|) and (|8^ can easily be solved in Fourier space. We assume that at 
time t = 0, the chains are in a ^ solvent, so that the initial condition {r(s,0)} obeys 
Gaussian statistics. We choose the initial virtual chain to coincide with the real one, so 
that A'"\s, 0) = r(s, 0) for any s. Denoting by the average over the initial conditions, 
the correlation function of r(s,0) (in Fourier space) is taken as: 



fn{0) = (89) 



rn{0)r*m{0) = ^^^mn (90) 

In Fourier space, the thermal noise is characterized by: 

< r?„(t) > =0 (91) 
< r?„(t)C(t') > ='^-^^^um^{t-^). (92) 

Replacing fn\t) and W their expression in (^8|), and taking thermal and initial 

condition averages, we obtain an implicit equation for a(t). 

This equation can be solved analytically in both limits t << (short time limit) and 
t » Tji (long time limit) where tr = ^^'"f) is the Rouse time. 

In order to take into account the hydrodynamic interactions with the solvent, one has 
to modify the Langevin equations in the following wayll3; equations (|7^) and (^) have 
to be replaced by: 



dr{s,t) 
dt 



N 



dsO{r{s,t)-r{s',t)) 



dH , , ^ 

tt; + v{s ,t) 



dr{s', t) 



— =j ds 0{r^ >{s,t) - '[s ,t)) 



where 0(r) is the Oseen tensor: 
and r] is the viscosity of the solvent. 



1 raV/s 



(93) 
(94) 
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Results 



In the absence of hydrodynamics interactions the results are the following: at short times 
t << Tij, the radius of gyration decreases as a power law: 



Rl{t)=Nal{l-\l-) (95) 

with a characteristic time Tc defined in ( p7|) and for large times t » Tc, the radius of 
gyration relaxes to that of a compact globule, resulting from the competition between the 
two-body and three-body terms, according to 

1(1 1 1 L 

Rg{t) {-)^m{l + e -2), (96) 

V 

where 

Note that for dimensions larger than 2, the relaxation time is much shorter than the 
Rouse time. For example in d = 3, T2 ~ Na compared to N'^ 

The first stage of collapse can be characterized by a time scale Tc given by: 

J _ 8(4vr)iar^ ^^^^ 



VGDTTvIdN 



2-d 
2 



where ^ ^ 

Id = du I du J with \u — u \ > A 

-'O ^0 [\u-u'\{l-\u-u'\)\2 

and A = is a short distance cut-off. 

For d < 2, the integral converges for small A, and I^i is independent of N. On the 
other hand, for d > 2, the integral is infra-red divergent and thus there is an explicit 
dependence on the cut-off. It is easily seen that this dependence exactly cancels out 
the A^ dependence in (^) so that the final characteristic time Tc is finite (independent of 
A^). In particular, for c? = 3, we find 

647r^ / OqA f clq 



3D \v J \5.22 

The order of magnitude of this short time collapse can be calculated for a typical 
protein in water. The diffusion constant of_a single amino-acid in water is typically 
D ~ 10~^cm^/s. As pointed out in reference E^, a monomer unit in a protein consists of 
approximately 5 amino acids (due to chain stiff'ness). Thus a typical value for the Kuhn 
length is ao = 7A. Consequently, the number of monomer units is 30 for a chain of 150 
aminoacids. We find a microscopic characteristic time Tc ~ 10~^s , the Rouse time being 
tr ~ 10~^ s. Note that the relaxation time T2 ~ 10~^s is of the order of magnitude of 
the Rouse time. The microscopic time Tc is several orders of magnitude lower than other 



estimates in the literature ( see references 00) . 



34 



If one includes the hydrodynamic interactions, the results are modified as follows: 

at short times t << tz where tz = ^ j ^ I^-^^ Zimm time. The radius of 

gyration decreases again like a power law, but with a smaller exponent than in the absence 
of hydrodynamic backflows: 

Rl{t) = Nal{l-(^y) (98) 



Trh 



where the characteristic time Tc h is defined by 



3-d 

vN—Jd 



The integral Jd is given by: 
Jd= du du'y ^—P ^ J with \u-u\>A 

and A = is a short distance cut-off. 

The singularity of can be analyzed, by noting that for x — > O"*", 



\/2^ /1\ v-^ 1 — cos(27rpx) , ,3 



Therefore, for d < 3, the integral converges for small A, and Jd is independent of A^. 
On the other hand, for d > 3, the integral is infra-red divergent and thus there is an 

d-3 

explicit dependence on the cut-off, more precisely, Jd ^ N 2 . 

Note that again, Tc^h does not depend on the number of monomers A^. 

Numerically, we find r^^h ~ 10~^s for a protein in water, and the Zimm time is 
TZ ~ lO-^s. 

For larger times , the radius of gyration relaxes to that of a compact globule according 

to 

W 1 1 — 

Rgit) {-)im{l + e ^^•>^), (99) 

V 



where 



Summary 

We have presented here a model where explicit attractive hydrophobic forces have been 
introduced for the collapse of the protein chain. We have assumed that the protein 
specificity is not important at the beginning of the collapse, and that the dynamics is the 
same as for a homopolymer chain. Our main result is that the chain collapses locally on 
a time scale Tc ~ 10~^ s, with or without the hydrodynamic interactions with the solvent. 



35 



This time is smaller than other theoretical estimates and doesn't depend on the number 
of amino-acids. This shows that in the early stage, the collapse is indeed a very local 
phenomenon, where nearby amino-acids aggregate into small domains. 

We emphasize again that in the large time regime, this calculation (as well as all 
other microscopic calculations) is not fully reliable, since it widely underestimates the 
entropy. The Flory theory predicts a zero entropy for the collapsed phase, whereas it is 
known to be extensive!. As a consequence, this method as well as others, cannot account 
correctly for the conformational changes between the various collapsed configurations. As 
a result, the relaxation times are orders of magnitude smaller than those predicted by 
more phenomenological theories 1^. 

Recently, there have been some promising atJ^miiJts to extend the microscopic ap- 
proach to the dynamics of random heteropolymersES'LaS. From a technical point of view, 
these approaches avoid the use of replicas (but not the quenched averages). Since these 
calculations are quite involved, we refer the reader to the original papers. 

Experimentation in this field is quite difficult since one has to work in a very dilute 
regime in order to avoid aggregation of chains and truely see the hydrophobic collapse. 
The most promising experiments are those by Chan et al. who study sub-millisecond 
protein folding by ultra-rapid mixing; based on optical techniques, these experiments can 
monitor folding up to the microsecond time-scale. 



3.2 Phenomenological approach 

We have seen in the previous section that under certain conditions, random heteropolymer 
models can be viewed as REM. As described in reference!^, the native state of a protein 
corresponds to the lowest energy state of the system, and all states with non native 
contacts have a higher energy. Due to the diversity of the amino acids, certain non native 
contacts can be favorable, whereas others might be unfavorable. It is thus natural to 
assume that the low lying states of a protein can be described by a REM. As we have 
discussed in section |2^, this assumption of a REM is reasonable in the collapsed phase 
of the protein, but certainly not in its coil phase. 

Although the analogy of heteropolymer models with REMs is purely thermodynamical 
and has no deep dynamical significance, it is tempting to pursue this analogy for the 
dynamics. One i s thus led to model the dynamics of folding of a protein by the dynamics 
of a REM ^3clrE3'E3 and references therein). This approach neglects the geometrical 
frustration that the system encounters when evolving from one conformation to another, 
but mimics the energetic frustration. 

We follow the method presented inS'0. We use a master equation to describe the 
transitions between the various states of the system, and design the transition rates in 
such a way that the system eventually relaxes to equilibrium. It should be noted that in 
some systems, there are cases where the system never relaxes to equilibrium, although the 
transition rates satisfy detailed balance. This phenomenon, related to aging, is treated 
in the review by Bouchaud et aE3. We believe that this phenomenon is not present in 
proteins, due to their small sizes. 
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The master equation we use is: 



dP 

-if = E (Wa-yP-yit) - W^Mt)) (100) 

7 

where Pa{t) denotes the probabihty for state a with energy to be occupied at time t, 
and Woi-y is the transition probability from state 7 to state a per unit time. 

To guarantee that the equilibrium distribution of the system will be a Boltzmann 
distribution, it is sufficient that the transition rates satisfy the detailed balance relation: 

We will discuss later in detail the choice of the transition rates. However, one important 
point is that in a globular phase (for which the REM has some significance), the Wa-y 
should not connect any state to any other state. 

To simplify notations, we will consider a REM with degrees of freedom and 2^ 
states, instead of the more general form of section 2.6, The probability for the system 
to have an energy level of energy E is given by: 



where J is an appropriate energy scale. As was discussed in section 2.6, such a model has 
a freezing transition at temperature Tc 



J 



2 V log 2 

The typical energies scale like J^/N , and it is not reasonable to assume a finite tran- 
sition rate for a direct transition from any state to any other. 

We are thus led to assume that the system was prepared at not too high temperature 
(i.e. at finite excitation energy from its ground state), and that it will relax by making 
transitions between the lowest lying states of the system. This is an important assumption, 
but we believe that it is crucial in representing proteins by REMs. 

If we restrict ourselves to the lowest lying states of the REM, it is well known that the 
distribution of energies of the system is not Gaussian anymore, but rather exponentialli^. 
Indeed, we are not sampling the bulk of the probability distribution, but rather its low 
energy tail (rare events). The calculation goes as follows. Let us denote by M the number 
of states that we want to include in the dynamics. This corresponds to the occupation 
of the M lowest energy states of the REM. In other words, one may remove all energies 
above a certain threshold energy E* , such that: 

M = 2^ dEP{E) (103) 



The integral of P{E) yields an error function: 



1 / nf E* \\ M , , 
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where the error function is defined by: 



^ 2 



erf(z) = ^ / dte"*' (105) 



vr Jo 

1 ^ e"^' if z^+oo (106) 



Since we take ^ — > 0, we obtain: 



2^/^og 2 



where Eq{N) is the ground state energy of the REM 



JlogiV 



Eo{N) = -NJ^l^ + —=1= (108) 

4vlog 2 

The term involving the number of states M considered represents the low lying levels. 
The energies of these states, denoted by {Ea}, are independent random variables between 
£"0 and E* . Linearizing the Gaussian distribution around the ground state, we find that 
these low lying states are distributed according to the exponential law: 

V{Ec) = Pcef^^^^"'^*^ if Ea < E* 

= if Ea> E* (109) 



where 



and 



l3c = Y (110) 



J 



(111) 



2^1^ 

is the critical temperature of the REM. 

There remains the question of the sensitivity of the method with respect to the choice 
of the number of statesM. From (p.07|),we obtain: 



V = Me-'^^^^*-^") = , ^ (112) 

independent of M. So, we can safely take the limit where the number of states M goes 
to infinity, and adjust the threshold energy E* so that equation (|112| ) is satisfied. 

We are thus led to study the dynamics of a set of random independent energy levels, 
distributed according to an exponential distribution law ( |109D . 

To completely specify the dynamics of the system, we must define the transition 
probabilities. The choice is not unique of course, but we will be guided by some physical 
considerations. The main assumption we will make is that the transition rate of the 
system between two states is completely dominated by the largest barrier that the system 
encounters between the two states. As we mentioned above, in order to use REM types of 
models, the system must be in a globular phase. We have seen in the previous section that 
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under certain conditions, there may be a "molten globule" phase in the high temperature 
region, and a folded or native state in the low temperature phase. Our goal is to study 
the dynamics in both phases. 

The choice of transition rates is governed by the detailed balance equation ( |10lD . 
Following ill, we take a general form: 

Wa-y = roe-^'-VaV^ (113) 

where the €a are the random non-extensive parts of the random energies Ea = EQ{N) + €a 
and the Va > can be viewed as barriers (this point will be discussed in the following). 
The constant Tq is the inverse of the transition time scale. 

Without specifying the Va, the master equation ( |10[1| ) can be solved by taking a 
Laplace transform. Defining the Laplace transform of Pa by: 

Paiz) = / e-'^Pait) (114) 

Jo 

the master equation can easily be solved. Indeed, defining: 

Q(^) = Y.^aPa (115) 

a 

c = T.^'^'^'v- (116) 

a 

we find: 



1 V- VaPaiO) 

I-Tol 

from which we get 



where -Po(O) is the initial probability of occupation of state a. 

The function Pa{z) is meromorphic, and thus Pa{t) can be obtained by the inverse 
Laplace transform: 

Pa{t) = / 17-e''Pa(z) (119) 

JC~ioo ^^TT 

where C is a real constant to the right of the largest pole of Paiz). This integral can be 
performed by using the residue method: 

Pait)= ^es{Pa,Zp)e'-' (120) 

all poles 

where "Res" denotes the residue and Zp the value of the pole. 

Various other time correlation functions can be calculated along these lines. We are 
thus led to study the pole and residue structure of (|ll j ). 

Apart from the obvious pole Zp = — FoV^Ci the other poles are the solutions of the 
equation: 

Z + ToVaC 
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Figure 5: Graphical solution of the pole equation 

This equation can be solved graphically (see Fig. ^). 

We see that there is a pole at Zp = 0, corresponding to the static (infinite time ) limit. 
In addition, one sees that all poles are negative, and close to the values = —ToVaQ- To 
analyze further the explicit time dependence of (|119|) , it is necessary to specify the values 
of the Va, and to perform the quenched average over the energy levels distribution. 

From the above analysis, we see that as M — > oo, the poles will concentrate between 
-ToVmaxC < and 0, and eventually give rise to a cut in the complex plane. The time 
dependence of Pa{t) is controlled by the behaviour of the density of poles around Zp = 0. 

In the following, we will assume that the system undergoes two phase transitions: one 
at Tg , similar to a theta point, from a swollen to a disordered collapsed phase, and one at 
lower temperature Tc , between the disordered collapsed phase and the native phase. As 
mentioned above, the REM makes sense only in the globular region, so that we restrict 
our description to that region. Since the chain constraint is not present in this approach, 
there is no induced geometry in the phase space. The following choices of transition rates 
mimic in some sense a topology in phase space. 

Model I: the high temperature region: Tc < T < Tq 

In the high temperature phase, we expect the system to have a single well-defined mini- 
mum, corresponding to a "liquid" condensed phase. This may correspond to the so-called 
"molten-globule" phase Ej. 

In addition, there should exist many local minima due to the existence of entangle- 
ments barriers. The corresponding phase space is schematically represented in Fig. ^. 

The point labeled C represents this liquid phase. The typical path for a transition 
from B to A is downhill from B to C then uphill from C to A. The dominant barrier is 
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Figure 6: Schematic phase space of model I 



thus from C to A, and it is natural to assume an Arrhenius like law: 

Wab = WoeM-P{EA-Ec)) (122) 
= W^oexp(-/3(eA-ec7)) (123) 

(124) 

The transition probability depends only on the final state a of the system. This amounts 
states 5 in equation (113) and to take Wq = T^V'^e'^^^ . This model 



to take Vs = V for a.. 

has been studied inE3. Although observables decay exponentially for a given instance of 
energy levels, for large times, the disorder averaged observables relax to their equilibrium 
value like stretched exponentials: 



P^{t) ^ PS" + (Pa(0) - PS') exp [-veT{l - j)(W^t)r j (125) 

where T is the gamma function, (3q is the inverse theta temperature of the system, and vq 
is given by equation ( [112| ), with f3e instead of /3c 

In this liquid condensed phase of a globular protein, dynamical processes are slowed 
down, due to the large number of metastable states on the way to the ground state. The 
exponent of the stretched exponential is equal to T/Tq < 1, and becomes equal to 1 at 
the theta temperature. For temperatures close to the theta temperature, it is close to 1 
and thus the dynamics looks close to a standard exponential relaxation. 

Such a stretched exponential behaviour has been observed in simulations of random 
heteropolymer models over several order of magnitudes of the timeE3. 

The phase space is reminiscent of what is defined as a "funnel" by@, and this suggests 
that the dynamics through a funnel should be stretched exponential. Let us note also 
that the phase space assumed here is similar to the one expected in each valley of the low 
temperature phase (see below). 



41 



FA 




Figure 7: Schematic phase space of model II 
Model II: the low temperature region: T < Tc 

In the low temperature phase, the system is expected to have a rugged landscape, with 
many quasi-degenerate minima, corresponding to incorrectly folded proteins. This cor- 
responds to the phase space of the native protein. This phase space is schematically 
represented in Fig. |^. 

As in the high temperature case, the path from a low lying state B to A is through a 
barrier, represented by C. Assuming that the height of the barrier between two states is 
essentially constant, we take the following form for the transition amplitudes: 

Wab = Woexpi-P{Ec-EB)) (126) 
= Woexp{-p{ec-eB)) (127) 

The transition probability depends only on the initial state 7 of the system, and more 
precisely on the barrier from the initial state to some typical point C through which the 
system should pass. This corresponds to the special case Vs = V e^'* in equation ( pl3| ), 
with Wo = ToV^e+'^'<^. 

This model was studied by Koper and Hilhorst 0. The quenched average of ( |119| ) 
can be performed, and the analysis of the long time behavior of the probability yields 
algebraic decay at large times: 

Pjt)^^-t~^ (128) 

This slow relaxation in the phase space of the native state, is due to the existence of 
many quasi-degenerate local minima. 
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A model for chaperon assisted protein folding 

It has been shown recentlyil that in living cells, there are specific enzymes which catalyze 
the folding of proteins. These catalytic proteins have been named chaperones; they were 
first discovered while heating up some bacteria (hence their original name of "heat-shock" 
proteins). Each chaperon can help certain specific proteins to fold. Although there is no 
consensus on the mechanisms through which they act, their universal character (they can 
catalyze the folding of many different proteins) suggests that they recognize some major 
differences between a native protein and a misfolded one. As we have seen in the previous 
sections, the collapse transition is driven by the hydrophobic effect: the protein folds so 
as to maximize the number of hydrophobic residues in its inner core, and the number of 
hydrophilic or charged residues on its outer shell. Presumably, misfolded proteins will 
have a larger number of hydrophobic residues on the outside, and a larger number of 
hydrophilic residues in the inside. The presence of these hydrophobic patches on the 
outside of the misfolded protein could then be recognized by a chaperon. 

Let us discuss a model that was originally proposed by Todd et al. 0. The idea is that 
the chaperon protein binds to the misfolded protein (by aggregating to their hydrophobic 
patches), and transfers high amounts of energy to the misfolded protein (through ATP 
hydrolysis). This "energy kick" pushes the protein out of its misfolded minimum, and 
gives it a chance to relax back to the native state. In addition, the chaperon does not bind 
to the native protein, (since it has an optimal hydrophobic outer shell, with no apparent 
hydrophobic patch). 

This model has been studied in the master equation formalism defined above S. The 
random energy states are the misfolded states, and the chaperon provides energy to induce 
transitions between these states. In addition, the native state, which we denote O, is not 
recognized by the chaperon. Thus, once the protein is in the native state, it will stay 
there forever. 

From the point of view of the transition rates Wq,^, we use the same model as in 
equation ( |113| ), except that we impose that the native state O is not connected to any 
other state: 

Wo^o = (129) 

Since misfolded proteins can refold into the native state, Woa (which is the transition 
rate from any state a to O) has no reason to vanish. These assumptions seem to violate 



detailed balance (101). We can however ignore this fact by noting that folding in presence 
of a chaperon is not really a thermal equilibrium phenomenon, and that these assumptions 
amount to assume that the free energy of the native state is much lower than that of all 
misfolded states (large gap). 

We may write two equations, one for the occupation of the native state O and one for 
any misfolded state a: 

jPo{t) = Toe-^'oVoYv^P^{t) (130) 



^P,(t) = To I e-P'-V^ J2 ^7^7 (*) - VaPait) ^ 6"^^^^^ | (131) 
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These equations can be solved along the lines described above. The Laplace transform 
of the occupation probability of the native state is: 

Po{z) = — \- To Q{z) (132) 

where 

C = Ee"^''^7 (134) 

7 

and the -Pq:(0) are the initial probabilities. 

As in the previous section, the large time behavior of Poit) is governed by the pole 
structure of ( |132| ). As usually, there is a pole at = corresponding to the static limit. 
It is easily seen on (|132| ), that the residue of -Po(O) at ^ = is equal to 1. This implies 
that: 

lim Po{t) = 1 (135) 

t— >oo 

Contrarily to thermal equilibrium, the equilibrium solution populates entirely the native 
state. 

The rest of the poles are given by the solutions of the equation: 



This equation can again be solved graphically. By contrast to ( |11^ ), where the sum runs 
over all states, here it runs over all states except the native one. As a result, z = is not 
anymore a solution of this equation; it is however a pole of Pa{z) (see above) . 

To be more specific and simplify the calculations, we discuss the case analogous to 
model I above. We take Va = V . Equations (|130| ) can be solved explicitly. Assuming 
that the native state is not populated at t = 0, we find: 

Po(i) = l-e-^°^'^"'^°* (137) 
and the quenched average can be performed, yielding: 

/bW~*^ool-y e-^o^o^^-''^** (138) 

where C is a constant. 

We see on this very simple model that the introduction of a trap (in the energy 
landscape) from which the system cannot escape, modifies the relaxations from stretched 
exponential or algebraic decay, to simple exponential. In addition, the yield of the re- 
laxation is 1, instead of a Boltzmann weight. This feature is quite independent on the 
specific form of the transition rates used. 

This qualitative feature is observed in experiments, where the introduction of chaper- 
ones is seen to enhance the folding rates by several order of magnitudes, and to increase 
significantly the yield of folding e3 
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3.3 Conclusion 



We have proposed two methods to approach the dynamics of protein foldine. Numerical 
simulations seem to support a two timescale picture for protein folding Ej'E3: i) a fast 
hydrophobic driven collapse followed by ii) a slow rearrangement process in the compact 
globule. 

Our first approach, which strongly relies on the polymeric character of the protein, 
yields results which should be valid in the short time limit, and thus applicable to stage 
i). On the other hand, the phenomenological approach stresses the heterogeneity of the 
system and describes the transition between the various compact conformations. It should 
therefore be applicable to stage ii). 

4 General conclusion 

We have presented a disorder oriented view of the protein folding problem. A protein is 
modelled by a heteropolymeric chain, with several possibles types of interactions among 
the monomers. Both static and dynamical issues were tackled. These models, as well 
as our approximate treatment thereof, may be a first step towards a more realistic de- 
scription of proteins. Among the weaknesses of the approaches we have presented, let us 
mention the schematic interactions, the thermodynamic limit, the continuous description, 
the quenched average ( instead of well defined sequences), the mean-field and ground- 
state dominance approximation, etc... These approximations are used to render calcula- 
tions tractable. Although they might sometimes be crude and irrealistic, we believe that 
our approach is useful, since it may shed some light on the connection of the folded state 
with other glassy systems. 

Appendix A: Basic polymer physics 

Al: General Introduction 

In this section, we introduce simple concepts of polymer physics that will be used in 
the following sections. General references, with a larger scope, include the books by de 
GennesS, Doi and EdwardsEl, des Cloizeaux and Janninki, and Freedill. 

A polymer chain in a solvent is modelled as a sequence of links representing the 
monomers. Typical values of the polymerization index A^ are 10^ in polymer chemistry, 
10^ — 10^ in proteins, and up to 10^'' for DNA molecules, which are the longest known 
polymers. Homopolymers are built from identical monomers, whereas heteropolymers 
may contain different units (e.g. 20 aminoacids for proteins or 4 nucleotides for DNA). 

In the following, we will consider only the single chain problem; indeed, the (in vitro) 
protein folding problem is a priori a one chain problem (we leave aside the question of 
chaperone assisted folding). The partition function for the chain can be written as: 

Z = j X{dn\{g{n,n+i) expi-pn) (139) 

i i 

where P is the inverse of the temperature T. The vectors fi denote the positions of the 
nodes. For example, in polyethylene, the fi would represent the positions of the carbon 
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atoms. For future use, we will assume that the polymer is embedded in a d— dimensional 
space. 

The chain constraint is expressed by the factors g{f,f*). Possible choices can be for 
instance: 

(i) g{r,r') = 5{\r — — a) , for freely hinged monomers of fixed length a. 

(ii) one can further modify g by restricting the nodes to belong to a regular lattice, 
which is a quite common simplification in Statistical Mechanics. 

(iii) g{f,f') = (2;^) 2 exp(— d ^^2a'^^ ) ' if the monomer length is allowed to fluctuate 
around a. 

If one wishes to include curvature (resp. torsion ) effects, one should use a function 
g{fi,fi+i,fi+2) (resp. c/(fi, f^+i, fi+2, fj+a) ). 
The "Hamiltonian" P7i may be expanded as: 

/3W= ^^w(fi,r,) + ^ w{ri,rj,rk) + ... (140) 

and represents the direct monomer-monomer interactions (such as Coulomb or Van der 
Waals interactions) as well as the solvent-induced interactions. The particular case of a 
chain with no interactions (7i = 0) is called a Brownian chain. 

In many polymer applications, one is interested in quantities which do not depend 
on the microscopic scale, hence the need for a continuum description. Loosely speaking, 
this continuous limit is obtained (in a field theoretic approach) by taking — > +00 and 
a — > in such a way that the product S = No?' , named the Brownian surface of the chain, 
remains constant. More rigorous arguments can be found in the book by des Cloizeaux 
and Janninkl§. 

From a practical point of view, it is sufficient to replace everywhere the discrete indices 
i, j, /c, . . . = 1, . . . , of eq. ( [139[ ) and ( |14C| ) by continuous curvilinear abscissae s,s' ,s" , . . .. 
Neglecting curvature and torsion effects, the partition function of the chain can be written 
as: 

Z = / WWe.p(-^ /; as (^S£l)' - m (141) 

with 

1 i-N i-N 1 i-N rN i-N 

pn = - ds ds'v{r{s),f{s')) + - ds ds' ds"w{f{s),f{s'),f{s")) (142) 
^ Jo Jo 6 Jo Jo Jo 

In the following, we shall use indistinctively the continuous or discrete versions of the 
partition function. However, as we shall see in appendix B, the continuous version is 
easier to handle for analytic calculations, since it can be mapped onto an imaginary time 
Schrodinger equation. 

A 2: Solvent-induced interactions 

It is physically intuitive that, in a good solvent, a polymer chain will swell so as to maxi- 
mize contacts between monomers and solvent molecules. On the contrary, a bad solvent 
will lead to a collapse of the chain. One may equally well say that a good (resp. bad) 
solvent generates an effective repulsive (resp. attractive) monomer-monomer interaction. 
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In the context of protein folding, it is widely believed that the interactions between the 
hydrophobic residues (Trp, He, Phe, ...) are the main driving forces for the collapse of 
proteins: hydrophobic residues in water can be thought of as being in a bad solvent, and 
thus have an attractive effective interaction. 

Let us consider a lattice model, with N monomers and A/' solvent molecules ? 
where each site is occupied either by a monomer or a solvent molecule or a vacancy. The 
Hamiltonian of this model reads: 

N M 

nms = Y.Yl «0(rl - Ro) (143) 
4=1 a=l 

where oq denotes a short-ranged monomer-solvent molecule interaction. 
Introducing the monomer and solvent volume concentrations: 

N 

Pm{r) =Y,^i^-ri) (144) 

i=l 

and 

AT 

p,(f) = 5]5(f-i?,) (145) 

a=l 

the Hamitonian (|143|) can be rewritten as: 



'HTns = ^^Pm{r)aQ{r-r)ps{r) (146) 

Assuming that the system is incompressible (no vacancies), we have, at each site r : 

Ps{r) + pm{f^ = I (147) 



Replacing (|147| ) in (|l4q ) we obtain: 

T-ims = - Pm{r)aQ{r - r)pm{r) pm{r)aQ{r - r) (148) 



The second term in ( 148 ) is a constant, equal to NAq where by definition Aq = 
Yjf;aQ{r). It will therefore be omitted. 

Note the change of sign of the first term between ( p,46D and ( [1481 ). It generates, as 
announced above, a contribution to the two-body interaction v{r — r') of (|140| ). The 
good (resp. bad) solvent is characterized by a negative (resp. positive) ao(?^), and indeed 
generates a repulsive (resp. attractive) interaction — ao(r) between monomers. In order 
to apply these considerations to heteropolymers or proteins, it is necessary to allow for 
sequence dependent interactions oq in equation ( |143D . This will be discussed in section 
2:51. 



Appendix B: Some methods of polymer physics 

In this appendix, we review the properties of free Brownian chains, and show their re- 
lation to diffusion and Schrodinger equations. For interacting chains, we present the 
self-consistent field approximation (SCF), and show how it can be further simplified when 
there is a gap in the energy spectrum, by using the ground state dominance approxima- 
tion. 
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Bl: Markovian and Brownian chains. 



Let us consider a discrete chain, made of A'^ monomers; the first atom of the chain, 
numbered 0, is fixed at point 0. By definition of a Markov chain, the position fn of atom 
n depends only on the position r„_i of the previous atom n — 1 along the chain. Namely, 
a Markov chain is characterized by the conditional probability distribution g{f,f') that 
atom n is at point r, given that atom n — 1 is at f*. If we further assume translational 
invariance, this function will only depend on the difference, and reads g{f— r*). 

The probability of finding the end of the chain (atom number A^) at point R, given 
that the origin is fixed at 0, is thus given by: 



GiR, N\o, 0) = y d^n . . . (frN-i g{R - tn-i) g{rN^i - rN-2) ■ ■ ■ gin - 0) (149) 
The Fourier transform F of G is thus given by: 

r{k, N) = J d'^R e^^^ G{R, iV|0, 0) = 7^(fc) (150) 

where 

^{k) = [ d^r f^^^ g{r) (151) 



Performing an inverse Fourier transform, we obtain: 

d'^k 



GiR, N\0, 0) = / (J^e-^« 7^(fe) (152) 



In the limit of long chains, ^ 00, we apply the stationary-phase method to (|152| ). Since 
g((r) is a probability distribution, its integral is one, and thus it is easy to see that: 

\j{k)\ < 7(0) = 1 (153) 
so that we may expand ( |152| ) around k = 0. We can write, to second order in k: 

= 1 - (154) 

where a, which has the dimension of a length, is called the Kuhn length, and can be 
interpreted as the effective monomer length. We have: 

GWmO)=/p^expHM-A.-.^)=(^) e-^ (155) 

We see that, for long enough chains, regardless of the function g{f), the asymptotic 
probability distribution for the end of the chain is Gaussian. As an outcome, we see that 
the entropy (more precisely the entropy reduction) of a chain, with origin constrained at 
and extremity constrained at R is given by: 

dR^ , , 

(156) 
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a result that we shall frequently use. 

The expression (156) of G{R, N\0,0) shows that it satisfies a diffusion-like equation: 

[m " i^') °) = ^(^)'^(^) (157) 

which is nothing but an imaginary-time Schrodinger equation. Therefore, using the stan- 
dard notations of quantum mechanics, we see that we can write: 

G(^,iV|0,0) =< ^|e-^^|0 > (158) 

where W is a quantum-like Hamiltonian equal to: 

7i = -^V^ (159) 

It is well-known in quantum mechanics 0, that matrix elements of the form ( |158| ), have a 
path-integral representation. We may thus write: 



G{R,N\0,0)= Pf(s)exp ^ dsr{s)] (160) 

Jr{o)=o \ 2a^ Jo / 



This continuous chain is called a Brownian chain. It is the prototype of all long chains 
defined by Markov processes. 

These expressions can easily be generalized to the case of a chain in an external 
potential. Assuming that each atom fn feels the potential U{rn)^ the distribution function 
of the chain at inverse temperature (5 reads: 



f.if{N)=R ( d 

/r(0)=0 

It has a form similar to ( |158| ) with 



G(R,N\0,0)= Vr(s)exp\ ^ dsr{s)-(5 dsU{r{s))\ (161) 

Jr(o)=o V 2a^ Jo Jo I 



n = -^V^ + pU{r) (162) 



and satisfies the Schrodinger equation: 



^ - + /3 C/(f) J G{R, N\6, 0) = 6{R)6{N) (163) 



The solution of (163) can easily be expressed in terms of the eigenstates and eigenvalues of 
the quantum Hamiltonian TC. Denoting by ^'n(^^) the eigenfunction of 7i with eigenvalue 
En, which satisfy the equation : 

W„(f) = En^nif^ (164) 

we can rewrite: 

G{R, N\6, 0) = J2 e-'^^"^n(i?)^n(0) (165) 

{n} 
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B2: Ground state dominance. 



An important simplification occurs when: 

(i) the length goes to oo. 

(ii) there is a gap A in the energy spectrum of TC, such that {NA ^ 1). 

(iii) there is not an exponentially large number of excited states in this spectrum. 
Indeed, in this case, the sum in equation (|165| ) is dominated by the ground state 

with energy Eq, and the distribution function reduces to: 

G{R, N\0, 0) ~ e-^^''*o(-R)*o(0) (166) 

Condition (i) and (ii) implies that, for a long enough chain, the Hamiltonian 7i has bound 
states. This is the case for the adsorption of a polymer chain on an impenetrable wall. 

If condition (iii) is not satisfied, or if the spectrum is continuous with no finite gap, 
one has to solve the full "time" -dependent Schrodinger equation. 

To conclude this section, let us recall that the ground state of a Hamiltonian can be 
deduced from a variational principle, namely the Rayleigh-Ritz principle. This principle 
states that the ground state energy Eq of a Hamiltonian TC is given by: 

Eo = min ^ I'll ^ (167) 

where the minimization runs over the space of square- integrable functions. Using the 
energy as a Lagrange multiplier to enforce normalization, it is equivalent to minimize the 
functional £: 

£ = j d'^r ^{r) (-|^V2 + 13 U{r) - Eq \ ^(f) (168) 



B3: Self consistent field approximation. 

The previous sections have shown how one can generate path integral representations for 
polymer problems. In the case of a self-interacting chain, however, there is no equiva- 
lent to the Schrodinger equation, and one has to resort to approxirnations to solve the 
problem. One such powerful approximation was devised by Edwards and is called the 
self-consistent field approximation (SCF). We first illustrate the method by the case of a 
chain in a bad solvent. 



SCF v^ith ground state dominance 

The partition function of this model reads: 

1^ 



Z= JVf{s) exp^-^J.^'dsf^) exp(|/o^dsds'5(f(.)-r (.'))) 

X exp (-f ds ds' ds"6 (f(s) - r{s')) 6 (f(.s) - r{s"))) (169) 



It is very useful to make a change of variable on (169), so that the partition function of 
the problem is expressed as a functional integral over all possible monomer concentration 
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{p(r)}. To do so, we enforce the variables {p{r)} by inserting the identity: 

1 = J Vcl){r)Vp{r)exp (^i J d'^r(t){r)p{r) - i ds </'(f(s))^ (170) 
which expresses that: 

rN 

p{r) = / ds 6{f-r{s)) (171) 







Inserting these identities in (167) yields: 



Z = J V4>{r)Vp{r) exp {i j d'^r(t>{r)p{r) + d'^rp^{r) J d'^rp^{r)^ C{^) (172) 



C(</>) = J Vr{s) exp (^-^J\s - i j\s 0(f(s))j (173) 



where 



According to ( |16lD , ( can be expressed as: 

((cj)) = j d'^r < r1e-^^|0 > (174) 

where 

W = -^V2 + i0(f) (175) 

In a bad solvent, we expect the chain to collapse at some temperature, and therefore, there 
should be some bound state in the system. We thus assume ground state dominance, so 
that: 

C(</.) = ^o(O) (^j d'^rM/o(f)) exp(-iVi^o) (176) 



where ^'o is the ground state of "H, with energy Eq. Using equations ( |167| ), we have seen 
that the extensive part of C, can be written as: 

/ <^|?t:|^'>\ 
C = exp -N min ^—^ (177) 



or equivalently 



C = exp — min 



j d'^r ^{r) ^-|^V^ + ^0(^) j ^{r) - Eq l^j d'^r ^'^(f) " l) ^ 

(178) 



where Eq appears as a Lagrange multiplier which constrains the norm of ^'^ to 1. 
Replacing (|17|) in (|17|) yields: 



Z = j V(t){r)Vp{f) exp {-NT{p, </>)) (179) 
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where 



NT{p, (^) = -i I <fr(t){r)p{r) --J d\p\r) + - / d\p\r) 



-N mill 

{*o(0} 



(fr ^{r) I -^V^ + i(/>(f) 1 ^{r) -Eq[ j (fr ^\r) - 1 



(180) 



Since (|179| ) cannot be evaluated exactly, we must use some approximation. A natural 
approximation is the saddle-point method (SPM), which consists in expanding T{p,(j)) 
around its minimum. 

The minimization equations with respect to 0(r), p{r) and ^{r) read: 

p(f) = N^'^{f) 
i(f){f) = -vp{r) + ^p^{r) 

Eo^{r) = (^-|^V2 + i</.(f)^ ^{r) (181) 

The first equation expresses that the monomer concentration is N times the square of the 
normalized wave-function. The second equation expresses the mean-field potential seen 
by each monomer, and finally the last equation can be recast in the form: 

- vN^!'^{r) + ^7V2^'^(f)^ ^{r) = EQ^{r) (182) 

where the energy Eq is chosen so that the square wave function is normalized. The 
above equation is a non-linear Schrodinger equation, which cannot in general be solved 
analytically. Let us note however that the two-body term v plays the role of an attractive 
self-consistent field, whereas w plays the role of a repulsive one, which forbids the collapse 
onto a finite region. In dimension larger than 2, this equation will have a bound state if 
V is large enough, and the associated energy will be finite, even in the limit N ^ oo. 

To get more analytic information, we can restrict the space of normalized wave func- 
tions {^'(f)} to the set of Gaussian wave-functions, depending on a single parameter R, 
which measures the spatial extent of the polymer globule: 




Replacing (|183| ) in (|18C| ) yields the simple equation for R: 

ao N , ^ 

where ag, 02 and 03 are simple numerical constants. This equation shows that for large 
and any attractive two-body interaction, the system is collapsed in a globular state, 
with finite concentration, with exponent u = 1/d. Note that this implies that the kinetic 
energy term yields (for d > 2) a vanishing contribution in the collapsed phase. Since 
this term is directly linked to the chain constraint (it is in fact the entropy loss due to 
the collapse transition) , we can see that the above treatment is not fully satisfactory. It 
does not describe well the extensive conformational entropy of the collapsed phase Ej. 
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SCF without ground state dominance 



We now consider the case where the ground state dominance approximation is not valid. 
Note that this may arise in (at least) two ways: either one may have to deal with a 
continuum spectrum for the Hamiltonian previously defined, or one may be faced with 
an exponentially large number of metastable states above the ground state. The former 
possibility is met in the self-avoiding chain, the latter has not been encountered yet, but 
should be present in some heteropolymer problems (one may have a "non-zero complex- 
ity", as in the p-spin glass models p > 3). An example of such a behavior is probably the 



coil phase "with metastable states" of section 2.5. For the sake of simplicity, we only in- 
clude two-body interactions in the Hamiltonian. In that case, one cannot use (176). One 
can still write saddle point equations for the variables (j){r) and p^f) in equation (172). 
These equations in turn yield a self consistent equation for p{'r). We have chosen here a 
slightly different presentation, to establish a possible connection with the spin glass TAP 
equations. The partition function reads 

Z= JVf{s) e^p{-^!,''dsf^-'^ J,''ds!,''ds'v{s,s')6{ris)-ris'))) (185) 

and may be rewritten through a Hubbard Stratanovich transformation 

Z= J V^{f, s) ex.p (^-^ J d'^r J ds J ds' ^{f,s)v{s,s')~^^{f,s')^ 

J Vr{s)exp J ds f^"^ exp (^-i J ds^{f{s),s)^ (186) 

The polymer integral may be rewritten as a Feynman path integral analogous to ( |160D , 
with both ends free. We have 



J P$(f,s) exp l^-iy d 



N rN ^ 

"^r / ds ds' ^{r,s)v~^{s,s')^{r,s') 



JO 

X J d'^r J d'^r' G{rN\f^O) (187) 
where the matrix element G(r, A^|r*,0) satisfies the equation 

[m ~ + ^'^(^"''^)) 0(^,^7^,0) = 5{r- f')5{N) (188) 

Due to the first order character of this equation with respect to the "time" N , one may 
rewrite the matrix element G as 

G(f, N\7^, 0) = j V-^{y, s) j V-^\y, s)^'(f, A^) q) 

X exp j d^pj^ ds^\p, >.) + ^ + icD(p, s)^ ^{p, 5)^189) 
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where 



a 
'2d 



V^. Pluggmg back equation (|189|) in (187), one may now integrate over 



the $(/9, s).We obtain 



-Jd^p dsj^ ds' \^>{p,s)\\{s,s')\^>{p,s')\ 



X exp 



^(p,s) 



X exp 



(190) 



where the short hand notation |^'(/j, s)p = '^{p,s)'^'^{p,s) was used. We now may per- 
formed the saddle point method with respect to both ^ and We have for instance 



2d 



V' + / ds'v{s, s')\^{p, s')r s)=0 



(191) 



with the boundary conditions \I'(r, 0) = 6{f), and a similar equation for 

The adjunction of three (and more) body interactions is straightforward. In princip' 
these equations may be solved. Edwards has studied the case of the self avoiding chain 



and obtained the Flory \ 
has been used recently I 



]ue of the swelling exponent v = 3/5. More recently, this method 
1 to find the phase diagram of (non random) block copolymer 
melts. In the presence of disorder, these equations are certainly difficult to solve: they are 
the equivalent of the spin glass TAP equations, which have not been solved numerically 
so far. 
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